Re: [PR] [lake] Support auto snapshot expiration for lake table [fluss]

via GitHub Fri, 19 Dec 2025 01:28:39 -0800


Copilot commented on code in PR #2184:
URL: https://github.com/apache/fluss/pull/2184#discussion_r2634354376



##########
fluss-flink/fluss-flink-tiering/src/main/java/org/apache/fluss/flink/tiering/FlussLakeTieringEntrypoint.java:
##########
@@ -30,11 +30,13 @@
 import static 
org.apache.flink.runtime.executiongraph.failover.FailoverStrategyFactoryLoader.FULL_RESTART_STRATEGY_NAME;
 import static 
org.apache.fluss.flink.tiering.source.TieringSourceOptions.DATA_LAKE_CONFIG_PREFIX;
 import static org.apache.fluss.utils.PropertiesUtils.extractAndRemovePrefix;
+import static org.apache.fluss.utils.PropertiesUtils.extractPrefix;
 
 /** The entrypoint for Flink to tier fluss data to lake format like paimon. */
 public class FlussLakeTieringEntrypoint {
 
     private static final String FLUSS_CONF_PREFIX = "fluss.";
+    private static final String LAKE_TIERING_CONFIG_PREFIX = "lake.teiring.";

Review Comment:
   Typo in the word "tiering": the prefix is misspelled as "lake.teiring." but 
should be "lake.tiering." to match the actual config option key defined in 
ConfigOptions.LAKE_TIERING_AUTO_EXPIRE_SNAPSHOT.
   ```suggestion
       private static final String LAKE_TIERING_CONFIG_PREFIX = "lake.tiering.";
   ```



##########
website/docs/engine-flink/options.md:
##########
@@ -81,7 +81,8 @@ See more details about [ALTER TABLE ... 
SET](engine-flink/ddl.md#set-properties)
 | table.datalake.enabled                  | Boolean  | false                   
            | Whether enable lakehouse storage for the table. Disabled by 
default. When this option is set to ture and the datalake tiering service is 
up, the table will be tiered and compacted into datalake format stored on 
lakehouse storage.                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                    |
 | table.datalake.format                   | Enum     | (None)                  
            | The data lake format of the table specifies the tiered Lakehouse 
storage format. Currently, supported formats are `paimon`, `iceberg`, and 
`lance`. In the future, more kinds of data lake format will be supported, such 
as DeltaLake or Hudi. Once the `table.datalake.format` property is configured, 
Fluss adopts the key encoding and bucketing strategy used by the corresponding 
data lake format. This ensures consistency in key encoding and bucketing, 
enabling seamless **Union Read** functionality across Fluss and Lakehouse. The 
`table.datalake.format` can be pre-defined before enabling 
`table.datalake.enabled`. This allows the data lake feature to be dynamically 
enabled on the table without requiring table recreation. If 
`table.datalake.format` is not explicitly set during table creation, the table 
will default to the format specified by the `datalake.format` configuration in 
the Fluss cluster. |
 | table.datalake.freshness                | Duration | 3min                    
            | It defines the maximum amount of time that the datalake table's 
content should lag behind updates to the Fluss table. Based on this target 
freshness, the Fluss service automatically moves data from the Fluss table and 
updates to the datalake table, so that the data in the datalake table is kept 
up to date within this target. If the data does not need to be as fresh, you 
can specify a longer target freshness time to reduce costs.                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                  |
-| table.datalake.auto-compaction          | Boolean | false                    
            | If true, compaction will be triggered automatically when tiering 
service writes to the datalake. It is disabled by default.                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |
+| table.datalake.auto-compaction          | Boolean  | false                   
            | If true, compaction will be triggered automatically when tiering 
service writes to the datalake. It is disabled by default.                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      |
+| table.datalake.auto-expire-snapshot     | Boolean  | false                   
            | If true, snapshot expiration will be triggered automatically when 
tiering service writes to the datalake. It is disabled by default.              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                     |

Review Comment:
   The description is inconsistent with the ConfigOptions source code. It 
should say "tiering service commits to the datalake" instead of "tiering 
service writes to the datalake" to match the definition in 
ConfigOptions.TABLE_DATALAKE_AUTO_EXPIRE_SNAPSHOT. Snapshot expiration is 
triggered after committing, not during writing.
   ```suggestion
   | table.datalake.auto-expire-snapshot     | Boolean  | false                 
              | If true, snapshot expiration will be triggered automatically 
when tiering service commits to the datalake. It is disabled by default.        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
   |
   ```



##########
website/docs/maintenance/tiered-storage/lakehouse-storage.md:
##########
@@ -102,4 +102,12 @@ To enable lakehouse storage for a table, the table must be 
created with the opti
 Another option `table.datalake.freshness`, allows per-table configuration of 
data freshness in the datalake.
 It defines the maximum amount of time that the datalake table's content should 
lag behind updates to the Fluss table. 
 Based on this target freshness, the Fluss tiering service automatically moves 
data from the Fluss table and updates to the datalake table, so that the data 
in the datalake table is kept up to date within this target.
-The default is `3min`, if the data does not need to be as fresh, you can 
specify a longer target freshness time to reduce costs.
\ No newline at end of file
+The default is `3min`, if the data does not need to be as fresh, you can 
specify a longer target freshness time to reduce costs.
+
+# Datalake Tiering Service Options
+
+The following table lists the options that can be used to configure the 
datalake tiering service.
+
+| Option                                  | Type     | Default | Description   
                                                                                
                                                                      |
+|-----------------------------------------|----------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| lake.tiering.auto-expire-snapshot       | Boolean  | false   | If true, 
snapshot expiration will be triggered automatically when tiering service 
commits to the datalake, event if `table.datalake.auto-expire-snapshot` is 
false. |

Review Comment:
   Typo in the word "even": "event" should be "even".
   ```suggestion
   | lake.tiering.auto-expire-snapshot       | Boolean  | false   | If true, 
snapshot expiration will be triggered automatically when tiering service 
commits to the datalake, even if `table.datalake.auto-expire-snapshot` is 
false.  |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [lake] Support auto snapshot expiration for lake table [fluss]

Reply via email to