luoyuxia opened a new issue, #3297:
URL: https://github.com/apache/fluss/issues/3297

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   ### Fluss version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   `MetadataManager#preAlterTableProperties` currently calls 
`lakeCatalog.alterTable(...)` whenever a lake catalog is configured:
   
   ```java
   // We should always alter lake table even though datalake is disabled.
   // Otherwise, if user alter the fluss table when datalake is disabled, then 
enable datalake
   // again, the lake table will mismatch.
   if (lakeCatalog != null) {
       try {
           lakeCatalog.alterTable(tablePath, tableChanges, lakeCatalogContext);
       } catch (TableNotExistException e) {
           // only throw TableNotExistException if datalake is enabled
           if (isDataLakeEnabled(newDescriptor)) {
               throw new FlussRuntimeException(
                       "Lake table doesn't exist for lake-enabled table "
                               + tablePath
                               + ", which shouldn't be happened. Please check 
if the lake table was deleted manually.",
                       e);
           }
       }
   }
   ```
   
   The intention is reasonable for tables that have been lake-enabled before: 
if the user disables datalake, alters the Fluss table, and enables datalake 
again later, the lake table metadata should still stay in sync.
   
   However, the current condition is too broad. For a normal Fluss table that 
has never been a lake table, any alter-table-properties operation still calls 
`lakeCatalog.alterTable(...)` when `lakeCatalog != null`. This creates 
unnecessary overhead and may also cause avoidable interaction with the lake 
catalog, even though there is no corresponding lake table to update.
   
   One triggering sequence is:
   
   1. Start a Fluss cluster with a lake catalog configured.
   2. Create a normal Fluss table without setting `table.datalake.enabled`.
   3. Alter unrelated table properties on that normal table.
   4. `MetadataManager` still calls `lakeCatalog.alterTable(...)` for the table 
and suppresses `TableNotExistException` because the new descriptor is not 
datalake-enabled.
   
   Expected behavior:
   
   For tables that have never opted into datalake, altering Fluss table 
properties should not call `lakeCatalog.alterTable(...)`.
   
   Actual behavior:
   
   As long as a lake catalog exists, `MetadataManager` attempts to alter the 
lake table for non-lake tables as well.
   
   ### Solution
   
   A simple mitigation is to only alter the lake table when the table 
descriptor contains the `table.datalake.enabled` key, regardless of whether the 
value is `true` or `false`.
   
   This keeps the existing behavior for the common case where a table was 
previously lake-enabled and later disabled, because such tables normally retain 
`table.datalake.enabled=false`. It avoids calling the lake catalog for tables 
that never set `table.datalake.enabled`.
   
   This is not a perfect historical signal, because a user could create a table 
with `table.datalake.enabled=false` from the beginning, but it should address 
most cases without requiring additional metadata.
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to