Re: [I] Tolerate lake table existent if the schema and properties matches when creating lake enabled table [fluss]

via GitHub Mon, 13 Oct 2025 23:52:30 -0700


LiebingYu commented on issue #846:
URL: https://github.com/apache/fluss/issues/846#issuecomment-3400363626


   > [@luoyuxia](https://github.com/luoyuxia) If no one is working on it, I’m 
willing to take it on.
   > 
   > I plan to add a new interface in `LakeCatalog`. Thus each lake plugin can 
impement their own logic and in `CoordinatorService` we can check if the exist 
`TableDescriptor` of lake table is compatible with the Fluss table's 
`TableDescriptor`.
   > 
   > @PublicEvolving
   > public interface LakeCatalog extends AutoCloseable {
   > 
   >     /**
   >      * Get a table in lake.
   >      *
   >      * @param tablePath path of the table to be created
   >      * @throws TableNotExistException if the table not exists
   >      */
   >     TableDescriptor getTable(TablePath tablePath) throws 
TableNotExistException;
   > }
   
   After some attempts, I found it's difficult to rebuild `TableDescriptor` 
from Paimon `Table`. For example:
   ```sql
   -- create a fluss table
   CREATE TABLE `fluss_catalog`.`fluss`.`fluss_t1` (
     `a` VARCHAR(2147483647),
     `b` VARCHAR(2147483647)
   ) WITH (
     'table.replication.factor' = '1',
     'table.datalake.format' = 'paimon',
     'table.datalake.freshness' = '30s',
     'table.datalake.paimon.metastore' = 'filesystem',
     'table.datalake.enabled' = 'true',
     'bucket.num' = '1',
     'table.datalake.paimon.warehouse' = '/tmp/paimon',
     'bootstrap.servers' = 'localhost:9123',
     'lookup.max-retries' = '3'
   );
   
   -- get lake table
   -- will have extra options: bucket, path
   -- In addition, there are options such as bucket-key and branch. Attempting 
to exhaustively enumerate all possible options that Paimon might add is 
error-prone.
   CREATE TABLE `paimon`.`fluss`.`fluss_t1` (
     `a` VARCHAR(2147483647),
     `b` VARCHAR(2147483647),
     `__bucket` INT,
     `__offset` BIGINT,
     `__timestamp` TIMESTAMP(6) WITH LOCAL TIME ZONE
   ) WITH (
     'bucket' = '-1',
     'fluss.lookup.max-retries' = '3',
     'path' = 'file:/tmp/paimon/fluss.db/fluss_t1',
     'fluss.table.replication.factor' = '1',
     'fluss.table.datalake.enabled' = 'true',
     'fluss.bucket.num' = '1',
     'fluss.table.datalake.format' = 'paimon',
     'fluss.table.datalake.freshness' = '30s'
   )
   ```
   
   Therefore, my point is that in `LakeCatalog#createTable`, if an existing 
table is encountered, we should directly compare the schemas of the two Paimon 
tables to check for consistency. Of course, this brings about an issue: if the 
newly created Fluss table modifies an property of an existing table—even if 
that property is allowed to be changed—an exception will still be thrown.
   
   How do you think about it? CC @luoyuxia @wuchong 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Tolerate lake table existent if the schema and properties matches when creating lake enabled table [fluss]

Reply via email to