[I] Support enabling lakehouse for pre-created tables with a compatible cluster-level gate [fluss]

via GitHub Fri, 20 Mar 2026 02:13:37 -0700


luoyuxia opened a new issue, #2908:
URL: https://github.com/apache/fluss/issues/2908


   ## Search before asking
   - [x] I searched in the [issues](https://github.com/alibaba/fluss/issues) 
and found nothing similar.
   
   ## Motivation
   Today, enabling lakehouse for an existing table only works reliably if the 
table was created after the cluster had already enabled datalake support. This 
causes a compatibility problem for the following user flow:
   
   1. Create a Fluss table when the cluster has not explicitly enabled 
lakehouse.
   2. Later configure the cluster to enable lakehouse.
   3. Enable lakehouse for the existing table.
   
   At the moment, step 3 fails for tables created before cluster-level 
lakehouse was enabled. The root issue is that `datalake.format` currently 
serves two roles at the same time:
   
   - selecting the lake-format-specific bucketing / key-encoding behavior; and
   - indicating that the cluster is ready to create and manage lake tables.
   
   This makes the semantics unclear for new deployments that want to pre-bind 
the future lake format (for example Paimon, so that bucketing stays consistent) 
but do not want users to enable lakehouse for tables until the cluster is 
explicitly switched on.
   
   We need a backward-compatible way to separate "legacy cluster behavior" from 
"new cluster behavior", while still allowing tables created before 
`table.datalake.enabled=true` to be enabled later if their bucketing format is 
already predetermined.
   
   ## Solution
   Introduce a new cluster config `datalake.enabled` with compatibility 
semantics:
   
   - `datalake.enabled` is **unset**: treat the cluster as a legacy cluster and 
keep the current behavior unchanged.
   - `datalake.enabled=false`: treat the cluster as a new-style cluster in 
"pre-bind only" mode.
   - `datalake.enabled=true`: treat the cluster as a new-style cluster with 
lakehouse fully enabled.
   
   For clusters where `datalake.enabled` is explicitly configured (either 
`true` or `false`):
   
   - require `datalake.format` to be configured;
   - automatically persist `table.datalake.format=<cluster datalake.format>` 
into newly created tables;
   - when `datalake.enabled=false`, do **not** allow creating/enabling lake 
tables yet;
   - when `datalake.enabled=true`, allow `ALTER TABLE ... SET 
('table.datalake.enabled'='true')` for tables whose `table.datalake.format` 
already matches the cluster `datalake.format`.
   
   This keeps old clusters fully compatible while enabling the desired flow for 
new clusters:
   
   1. Create cluster with `datalake.enabled=false` and `datalake.format=paimon`.
   2. Create table; Fluss auto-persists `table.datalake.format=paimon`, so 
writes already follow Paimon bucketing.
   3. Later switch cluster to `datalake.enabled=true`.
   4. Enable lakehouse for the existing table successfully.
   
   Suggested validation rules:
   
   - If `datalake.enabled` is explicitly set but `datalake.format` is missing, 
fail fast.
   - If a table has no persisted `table.datalake.format`, keep rejecting later 
lakehouse enablement to avoid bucket inconsistency.
   - If a table's `table.datalake.format` differs from the cluster 
`datalake.format`, reject enablement.
   - In new-style clusters, `datalake.format` should be treated as immutable 
(or at least strongly restricted) once tables have been created with the 
pre-bound format.
   
   Affected areas likely include:
   
   - cluster config parsing / compatibility checks;
   - `CoordinatorService.applySystemDefaults(...)`;
   - `LakeCatalogDynamicLoader` load conditions;
   - alter-table validation for `table.datalake.enabled`.
   
   ## Anything else?
   This issue is mainly about compatibility and semantic clarity:
   
   - old clusters should continue to behave exactly as they do today;
   - new clusters should be able to pre-bind lake-format bucketing without 
exposing lakehouse functionality too early;
   - users should be able to create a table first, enable cluster lakehouse 
later, and then enable lakehouse on that table successfully.
   
   ## Willingness to contribute
   - [x] I'm willing to submit a PR!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support enabling lakehouse for pre-created tables with a compatible cluster-level gate [fluss]

Reply via email to