Re: [Discuss] FIP-25: Support Multi-Location for Remote Storage

Liebing Yu Thu, 29 Jan 2026 22:59:50 -0800

Hi Yuxia, thanks for the thoughtful response. Let me go through your
questions one by one.


1. I think after we support `remote.data.dirs`, different schemas will be
supported naturally.
2. Yes, I think we should change from `PbTablePath` to
`PbPhysicalTablePath`.
3. Thanks for the reminder. I'll poc authentication in
https://github.com/apache/fluss/issues/2518. But it doesn't block the
multiple-paths implementation in Fluss server in
https://github.com/apache/fluss/issues/2517.
4. For a partition table, the table itself has a remote data dir for
metadata (such as lake offset). And each partition has its own remote dir
for table data (e.g. kv or log data).
5. Legacy clients can access data in the new cluster.

   - If the permissions of the paths specified in `remote.data.dirs` on the
   new cluster match those configured in `remote.data.dir`, seamless access is
   achievable.
   - If the permissions are inconsistent, access permissions must be
   explicitly configured. For example, when using OSS, a policy granting
   access permissions to the account identified by `fs.oss.roleArn` must be
   configured for each bucket specified in `remote.data.dirs`.


Best regards,
Liebing Yu


On Thu, 29 Jan 2026 at 10:07, Yuxia Luo <[email protected]> wrote:

> Hi, Liebing
>
> Thanks for the detailed FIP. I have a few questions:
> 1. Does `remote.data.dirs` support paths with different schemes? For
> example:
> ```
> remote.data.dirs: oss://bucket1/fluss-data, s3://bucket2/fluss-data
> ```
>
> 2. Should `GetFileSystemSecurityTokenRequest` include partition?
> The FIP adds `table_path` to the request, but since different partitions
> may reside on different remote paths (and require different tokens),
> should the request also include partition information?
>
> 3. Just a reminder that `DefaultSecurityTokenManager` will become more
> complex...
> This is not a blocker, but worth a poc to recoginize any complexity
>
> 4. I want to confirm my understanding: For a partitioned table, does the
> table itself have a remote dir, AND each partition also has its own remote
> dir?
>
> Or is it:
> - Non-partitioned table → table-level remote dir
> - Partitioned table → only partition-level remote dirs (no table-level)?
>
> 5. Can old clients (without table path in token request) still read data
> from new clusters?
> One possibe solution is : For RPCs without table information, the server
> returns a token for the first dir in `remote.data.dirs`. Or other ways that
> allow users to configure the cluster to keep compatibility
>
>
>
> On 2026/01/21 03:52:29 Zhe Wang wrote:
> > Thanks for your response, now it looks good to me.
> >
> > Best regards,
> > Zhe Wang
> >
> > Liebing Yu <[email protected]> 于2026年1月20日周二 14:29写道：
> >
> > > Hi Zhe, sorry for the late reply.
> > >
> > > The primary focus of this FIP is not to address read/write issues at
> the
> > > table or partition level, but rather to overcome limitations at the
> cluster
> > > level. Given the current capabilities of object storage, read/write
> > > performance for a single table or partition is unlikely to be a
> bottleneck;
> > > however, for a large-scale Fluss cluster, it can easily become one.
> > > Therefore, the core objective here is to distribute the cluster-wide
> > > read/write traffic across multiple remote storage systems.
> > >
> > > Best regards,
> > > Liebing Yu
> > >
> > >
> > > On Wed, 14 Jan 2026 at 16:07, Zhe Wang <[email protected]> wrote:
> > >
> > > > Hi Liebing, Thanks for the clarification.
> > > > >1. To clarify, the data is currently split by partition level for
> > > > partitioned tables and by table for non-partitioned tables.
> > > >
> > > > Therefore the main aim of this FIP is improving the speed of read
> data
> > > from
> > > > different partitions, store data speed may still limit for a single
> > > system?
> > > >
> > > > Best,
> > > > Zhe Wang
> > > >
> > > > Liebing Yu <[email protected]> 于2026年1月13日周二 19:11写道：
> > > >
> > > > > Hi Zhe, Thanks for the questions!
> > > > >
> > > > > 1. To clarify, the data is currently split by partition level for
> > > > > partitioned tables and by table for non-partitioned tables.
> > > > >
> > > > > 2. Regarding RemoteStorageCleaner, you are absolutely right.
> Supporting
> > > > > remote.data.dirs there is necessary for a complete cleanup when a
> table
> > > > is
> > > > > dropped.
> > > > >
> > > > > Thanks for pointing that out!
> > > > >
> > > > > Best regards,
> > > > > Liebing Yu
> > > > >
> > > > >
> > > > > On Mon, 12 Jan 2026 at 17:02, Zhe Wang <[email protected]>
> wrote:
> > > > >
> > > > > > Hi Liebing,
> > > > > >
> > > > > > Thanks for driving this, I think it's a really useful feature.
> > > > > > I have two small questions:
> > > > > > 1. What's the scope for split data in dirs, I see there's a
> > > partitionId
> > > > > in
> > > > > > ZK Data, so the data will spit by partition in different
> directories,
> > > > or
> > > > > by
> > > > > > bucket?
> > > > > > 2. Maybe it needs to support remote.data.dirs in
> > > RemoteStorageCleaner?
> > > > So
> > > > > > we can delete all remoteStorage when delete table.
> > > > > >
> > > > > > Best,
> > > > > > Zhe Wang
> > > > > >
> > > > > > Liebing Yu <[email protected]> 于2026年1月8日周四 20:10写道：
> > > > > >
> > > > > > > Hi devs,
> > > > > > >
> > > > > > > I propose initiating discussion on FIP-25[1]. Fluss leverages
> > > remote
> > > > > > > storage systems—such as Amazon S3, HDFS, and Alibaba Cloud
> OSS—to
> > > > > > deliver a
> > > > > > > cost-efficient, highly available, and fault-tolerant storage
> > > solution
> > > > > > > compared to local disk. *However, in production environments,
> we
> > > > often
> > > > > > find
> > > > > > > that the bandwidth of a single remote storage becomes a
> bottleneck.
> > > > > > *Taking
> > > > > > > OSS[2] as an example, the typical upload bandwidth limit for a
> > > single
> > > > > > > account is 20 Gbit/s (Internal) and 10 Gbit/s (Public). So I
> > > > initiated
> > > > > > this
> > > > > > > FIP which aims to introduce support for multiple remote storage
> > > paths
> > > > > and
> > > > > > > enables the dynamic addition of new storage paths without
> service
> > > > > > > interruption.
> > > > > > >
> > > > > > > Any feedback and suggestions on this proposal are welcome!
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-25%3A+Support+Multi-Location+for+Remote+Storage
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://www.alibabacloud.com/help/en/oss/user-guide/limits?spm=a2c63.l28256.help-menu-31815.d_0_0_5.2ac34d06oZYFvK
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Liebing Yu
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Discuss] FIP-25: Support Multi-Location for Remote Storage

Reply via email to