Hi Rui,

Thanks for your comments.

> In my opinion, we need to reduce the times of fetching RocksDB
> SST from remote to local. The FLIP seems to batch the RocksDB
> put/get requests. I am not sure this will reduce the SST fetching times.

For multiple Get(read) requests, we will convert them into one MultiGet
request. The MultiGet could merge block read IO which belong to multiple
Get requests, thereby reducing the times of fetching RocksDB Block/SST from
remote filesystem.

> How to monitor the I/O used by state disaggregation?

On the one hand, the remote storage(S3/HDFS/OSS) generally provides
network/IO throughput monitoring. On the other hand, on the Flink side we
can also provide some metric about accessing remote storage, eg. the ratio
of remote storage access to local disk cache access, etc.

> How about customizing batching strategy? I

I don't recommend permitting users to customize their batching strategy, as
very few users have the capability to customize this strategy effectively.
However, we could offer some configurable options that allow users to
adjust the behavior of batching, such as the batching size and so on.

Best,
Jinzhong


On Fri, Mar 15, 2024 at 12:08 PM 夏 瑞 <xiarui0...@hotmail.com> wrote:

> Hi Jinzhong,
>
> Batching state access is a reasonable way to reduce the amount of I/O
> compared to per-record state access. But I have some questions:
>
> - In my opinion, we need to reduce the times of fetching RocksDB SST from
> remote to local. The FLIP seems to batch the RocksDB put/get requests. I am
> not sure this will reduce the SST fetching times.
>
> - How to monitor the I/O used by state disaggregation? The latency/amount
> of I/O on DFS is important to the performance diagnosis. Moreover, the
> amount of I/O also influences the stability of the DFS. For example, the
> throughput of HDFS NameNode is hard to scale and suffers from I/O flood.
>
> - How about customizing batching strategy? Intuitively, an extremely large
> batch may need lots of memory to hold the returned results, and causes OOM.
> On the other size, if we randomly batch keys, state storage may navigate
> all SSTs to find the results.
>
> Best wishes,
> Rui Xia.
>
> ________________________________
> 发件人: Hangxiang Yu <master...@gmail.com>
> 发送时间: 2024年3月15日 4:04
> 收件人: xiarui0...@hotmail.com <xiarui0...@hotmail.com>
> 主题: Fwd: [DISCUSS] FLIP-426: Grouping Remote State Access
>
>
>
> ---------- Forwarded message ---------
> From: Jinzhong Li <lijinzhong2...@gmail.com<mailto:
> lijinzhong2...@gmail.com>>
> Date: Thu, Mar 7, 2024 at 4:52 PM
> Subject: [DISCUSS] FLIP-426: Grouping Remote State Access
> To: <dev@flink.apache.org<mailto:dev@flink.apache.org>>
> Cc: <yuanmei.w...@gmail.com<mailto:yuanmei.w...@gmail.com>>, <
> zakelly....@gmail.com<mailto:zakelly....@gmail.com>>, <master...@gmail.com
> <mailto:master...@gmail.com>>, <fredia...@gmail.com<mailto:
> fredia...@gmail.com>>, <fengw...@apache.org<mailto:fengw...@apache.org>>
>
>
>
> Hi devs,
>
>
> I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated
> State Storage and Management[1], which is a joint work of Yuan Mei, Zakelly
> Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang:
>
> - FLIP-426: Grouping Remote State Access<
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-426%3A+Grouping+Remote+State+Access>
> [2]
>
> This FLIP enables retrieval of remote state data in batches to avoid
> unnecessary round-trip costs for remote access.
>
> Please make sure you have read the FLIP-423[1] to know the whole story,
> and we'll discuss the details of FLIP-424[2] under this mail. For the
> discussion of overall architecture or topics related with multiple
> sub-FLIPs, please post in the previous mail[3].
>
> Looking forward to hearing from you!
>
> [1] https://cwiki.apache.org/confluence/x/R4p3EQ
>
> [2] https://cwiki.apache.org/confluence/x/TYp3EQ
>
> [3] https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0
>
> Best,
>
> Jinzhong Li
>
>
>
>
> --
> Best,
> Hangxiang.
>

Reply via email to