Copilot commented on code in PR #2184:
URL: https://github.com/apache/fluss/pull/2184#discussion_r2634354376
##########
fluss-flink/fluss-flink-tiering/src/main/java/org/apache/fluss/flink/tiering/FlussLakeTieringEntrypoint.java:
##########
@@ -30,11 +30,13 @@
import static
org.apache.flink.runtime.executiongraph.failover.FailoverStrategyFactoryLoader.FULL_RESTART_STRATEGY_NAME;
import static
org.apache.fluss.flink.tiering.source.TieringSourceOptions.DATA_LAKE_CONFIG_PREFIX;
import static org.apache.fluss.utils.PropertiesUtils.extractAndRemovePrefix;
+import static org.apache.fluss.utils.PropertiesUtils.extractPrefix;
/** The entrypoint for Flink to tier fluss data to lake format like paimon. */
public class FlussLakeTieringEntrypoint {
private static final String FLUSS_CONF_PREFIX = "fluss.";
+ private static final String LAKE_TIERING_CONFIG_PREFIX = "lake.teiring.";
Review Comment:
Typo in the word "tiering": the prefix is misspelled as "lake.teiring." but
should be "lake.tiering." to match the actual config option key defined in
ConfigOptions.LAKE_TIERING_AUTO_EXPIRE_SNAPSHOT.
```suggestion
private static final String LAKE_TIERING_CONFIG_PREFIX = "lake.tiering.";
```
##########
website/docs/engine-flink/options.md:
##########
@@ -81,7 +81,8 @@ See more details about [ALTER TABLE ...
SET](engine-flink/ddl.md#set-properties)
| table.datalake.enabled | Boolean | false
| Whether enable lakehouse storage for the table. Disabled by
default. When this option is set to ture and the datalake tiering service is
up, the table will be tiered and compacted into datalake format stored on
lakehouse storage.
|
| table.datalake.format | Enum | (None)
| The data lake format of the table specifies the tiered Lakehouse
storage format. Currently, supported formats are `paimon`, `iceberg`, and
`lance`. In the future, more kinds of data lake format will be supported, such
as DeltaLake or Hudi. Once the `table.datalake.format` property is configured,
Fluss adopts the key encoding and bucketing strategy used by the corresponding
data lake format. This ensures consistency in key encoding and bucketing,
enabling seamless **Union Read** functionality across Fluss and Lakehouse. The
`table.datalake.format` can be pre-defined before enabling
`table.datalake.enabled`. This allows the data lake feature to be dynamically
enabled on the table without requiring table recreation. If
`table.datalake.format` is not explicitly set during table creation, the table
will default to the format specified by the `datalake.format` configuration in
the Fluss cluster. |
| table.datalake.freshness | Duration | 3min
| It defines the maximum amount of time that the datalake table's
content should lag behind updates to the Fluss table. Based on this target
freshness, the Fluss service automatically moves data from the Fluss table and
updates to the datalake table, so that the data in the datalake table is kept
up to date within this target. If the data does not need to be as fresh, you
can specify a longer target freshness time to reduce costs.
|
-| table.datalake.auto-compaction | Boolean | false
| If true, compaction will be triggered automatically when tiering
service writes to the datalake. It is disabled by default.
|
+| table.datalake.auto-compaction | Boolean | false
| If true, compaction will be triggered automatically when tiering
service writes to the datalake. It is disabled by default.
|
+| table.datalake.auto-expire-snapshot | Boolean | false
| If true, snapshot expiration will be triggered automatically when
tiering service writes to the datalake. It is disabled by default.
|
Review Comment:
The description is inconsistent with the ConfigOptions source code. It
should say "tiering service commits to the datalake" instead of "tiering
service writes to the datalake" to match the definition in
ConfigOptions.TABLE_DATALAKE_AUTO_EXPIRE_SNAPSHOT. Snapshot expiration is
triggered after committing, not during writing.
```suggestion
| table.datalake.auto-expire-snapshot | Boolean | false
| If true, snapshot expiration will be triggered automatically
when tiering service commits to the datalake. It is disabled by default.
|
```
##########
website/docs/maintenance/tiered-storage/lakehouse-storage.md:
##########
@@ -102,4 +102,12 @@ To enable lakehouse storage for a table, the table must be
created with the opti
Another option `table.datalake.freshness`, allows per-table configuration of
data freshness in the datalake.
It defines the maximum amount of time that the datalake table's content should
lag behind updates to the Fluss table.
Based on this target freshness, the Fluss tiering service automatically moves
data from the Fluss table and updates to the datalake table, so that the data
in the datalake table is kept up to date within this target.
-The default is `3min`, if the data does not need to be as fresh, you can
specify a longer target freshness time to reduce costs.
\ No newline at end of file
+The default is `3min`, if the data does not need to be as fresh, you can
specify a longer target freshness time to reduce costs.
+
+# Datalake Tiering Service Options
+
+The following table lists the options that can be used to configure the
datalake tiering service.
+
+| Option | Type | Default | Description
|
+|-----------------------------------------|----------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| lake.tiering.auto-expire-snapshot | Boolean | false | If true,
snapshot expiration will be triggered automatically when tiering service
commits to the datalake, event if `table.datalake.auto-expire-snapshot` is
false. |
Review Comment:
Typo in the word "even": "event" should be "even".
```suggestion
| lake.tiering.auto-expire-snapshot | Boolean | false | If true,
snapshot expiration will be triggered automatically when tiering service
commits to the datalake, even if `table.datalake.auto-expire-snapshot` is
false. |
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]