Re: [Discuss] FIP-25: Support Multi-Location for Remote Storage

Yuxia Luo Wed, 28 Jan 2026 18:07:20 -0800

Hi, Liebing

Thanks for the detailed FIP. I have a few questions:
1. Does `remote.data.dirs` support paths with different schemes? For example:
```
remote.data.dirs: oss://bucket1/fluss-data, s3://bucket2/fluss-data
```


2. Should `GetFileSystemSecurityTokenRequest` include partition?
The FIP adds `table_path` to the request, but since different partitions 
may reside on different remote paths (and require different tokens), 
should the request also include partition information?

3. Just a reminder that `DefaultSecurityTokenManager` will become more 
complex...
This is not a blocker, but worth a poc to recoginize any complexity

4. I want to confirm my understanding: For a partitioned table, does the table 
itself have a remote dir, AND each partition also has its own remote dir?

Or is it:
- Non-partitioned table → table-level remote dir
- Partitioned table → only partition-level remote dirs (no table-level)?

5. Can old clients (without table path in token request) still read data from 
new clusters?
One possibe solution is : For RPCs without table information, the server 
returns a token for the first dir in `remote.data.dirs`. Or other ways that 
allow users to configure the cluster to keep compatibility



On 2026/01/21 03:52:29 Zhe Wang wrote:
> Thanks for your response, now it looks good to me.
> 
> Best regards,
> Zhe Wang
> 
> Liebing Yu <[email protected]> 于2026年1月20日周二 14:29写道：
> 
> > Hi Zhe, sorry for the late reply.
> >
> > The primary focus of this FIP is not to address read/write issues at the
> > table or partition level, but rather to overcome limitations at the cluster
> > level. Given the current capabilities of object storage, read/write
> > performance for a single table or partition is unlikely to be a bottleneck;
> > however, for a large-scale Fluss cluster, it can easily become one.
> > Therefore, the core objective here is to distribute the cluster-wide
> > read/write traffic across multiple remote storage systems.
> >
> > Best regards,
> > Liebing Yu
> >
> >
> > On Wed, 14 Jan 2026 at 16:07, Zhe Wang <[email protected]> wrote:
> >
> > > Hi Liebing, Thanks for the clarification.
> > > >1. To clarify, the data is currently split by partition level for
> > > partitioned tables and by table for non-partitioned tables.
> > >
> > > Therefore the main aim of this FIP is improving the speed of read data
> > from
> > > different partitions, store data speed may still limit for a single
> > system?
> > >
> > > Best,
> > > Zhe Wang
> > >
> > > Liebing Yu <[email protected]> 于2026年1月13日周二 19:11写道：
> > >
> > > > Hi Zhe, Thanks for the questions!
> > > >
> > > > 1. To clarify, the data is currently split by partition level for
> > > > partitioned tables and by table for non-partitioned tables.
> > > >
> > > > 2. Regarding RemoteStorageCleaner, you are absolutely right. Supporting
> > > > remote.data.dirs there is necessary for a complete cleanup when a table
> > > is
> > > > dropped.
> > > >
> > > > Thanks for pointing that out!
> > > >
> > > > Best regards,
> > > > Liebing Yu
> > > >
> > > >
> > > > On Mon, 12 Jan 2026 at 17:02, Zhe Wang <[email protected]> wrote:
> > > >
> > > > > Hi Liebing,
> > > > >
> > > > > Thanks for driving this, I think it's a really useful feature.
> > > > > I have two small questions:
> > > > > 1. What's the scope for split data in dirs, I see there's a
> > partitionId
> > > > in
> > > > > ZK Data, so the data will spit by partition in different directories,
> > > or
> > > > by
> > > > > bucket?
> > > > > 2. Maybe it needs to support remote.data.dirs in
> > RemoteStorageCleaner?
> > > So
> > > > > we can delete all remoteStorage when delete table.
> > > > >
> > > > > Best,
> > > > > Zhe Wang
> > > > >
> > > > > Liebing Yu <[email protected]> 于2026年1月8日周四 20:10写道：
> > > > >
> > > > > > Hi devs,
> > > > > >
> > > > > > I propose initiating discussion on FIP-25[1]. Fluss leverages
> > remote
> > > > > > storage systems—such as Amazon S3, HDFS, and Alibaba Cloud OSS—to
> > > > > deliver a
> > > > > > cost-efficient, highly available, and fault-tolerant storage
> > solution
> > > > > > compared to local disk. *However, in production environments, we
> > > often
> > > > > find
> > > > > > that the bandwidth of a single remote storage becomes a bottleneck.
> > > > > *Taking
> > > > > > OSS[2] as an example, the typical upload bandwidth limit for a
> > single
> > > > > > account is 20 Gbit/s (Internal) and 10 Gbit/s (Public). So I
> > > initiated
> > > > > this
> > > > > > FIP which aims to introduce support for multiple remote storage
> > paths
> > > > and
> > > > > > enables the dynamic addition of new storage paths without service
> > > > > > interruption.
> > > > > >
> > > > > > Any feedback and suggestions on this proposal are welcome!
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLUSS/FIP-25%3A+Support+Multi-Location+for+Remote+Storage
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://www.alibabacloud.com/help/en/oss/user-guide/limits?spm=a2c63.l28256.help-menu-31815.d_0_0_5.2ac34d06oZYFvK
> > > > > >
> > > > > > Best regards,
> > > > > > Liebing Yu
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Discuss] FIP-25: Support Multi-Location for Remote Storage

Reply via email to