swuferhong commented on code in PR #2179: URL: https://github.com/apache/fluss/pull/2179#discussion_r2702398182
##########
website/docs/engine-flink/options.md:
##########
@@ -91,20 +91,22 @@ See more details about [ALTER TABLE ...
SET](engine-flink/ddl.md#set-properties)
## Read Options
-| Option | Type | Default
| Description
|
-|-----------------------------------------------------|------------|-------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| scan.startup.mode | Enum | full
| The scan startup mode enables you to
specify the starting point for data consumption. Fluss currently supports the
following `scan.startup.mode` options: `full` (default), earliest, latest,
timestamp. See the [Start Reading
Position](engine-flink/reads.md#start-reading-position) for more details.
|
-| scan.startup.timestamp | Long | (None)
| The timestamp to start reading the data
from. This option is only valid when `scan.startup.mode` is set to `timestamp`.
The format is 'milli-second-since-epoch' or `yyyy-MM-dd HH:mm:ss`, like
`1678883047356` or `2023-12-09 23:09:12`.
|
-| scan.partition.discovery.interval | Duration | 1min
| The time interval for the Fluss source
to discover the new partitions for partitioned table while scanning. A
non-positive value disables the partition discovery. The default value is 1
minute. Currently, since Fluss Admin#listPartitions(TablePath tablePath)
requires a large number of requests to ZooKeeper in server, this option cannot
be set too small, as a small value would cause frequent requests and increase
server load. In the future, once list partitions is optimized, the default
value of this parameter can be reduced. |
-| client.scanner.log.check-crc | Boolean | true
| Automatically check the CRC3 of the
read records for LogScanner. This ensures no on-the-wire or on-disk corruption
to the messages occurred. This check adds some overhead, so it may be disabled
in cases seeking extreme performance.
|
-| client.scanner.log.max-poll-records | Integer | 500
| The maximum number of records returned
in a single call to poll() for LogScanner. Note that this config doesn't impact
the underlying fetching behavior. The Scanner will cache the records from each
fetch request and returns them incrementally from each poll.
|
-| client.scanner.log.fetch.max-bytes | MemorySize | 16mb
| The maximum amount of data the server
should return for a fetch request from client. Records are fetched in batches,
and if the first record batch in the first non-empty bucket of the fetch is
larger than this value, the record batch will still be returned to ensure that
the fetch can make progress. As such, this is not a absolute maximum.
|
-| client.scanner.log.fetch.max-bytes-for-bucket | MemorySize | 1mb
| The maximum amount of data the server
should return for a table bucket in fetch request fom client. Records are
fetched in batches, and the max bytes size is config by this option.
|
-| client.scanner.log.fetch.min-bytes | MemorySize | 1b
| The minimum bytes expected for each
fetch log request from client to response. If not enough bytes, wait up to
client.scanner.log.fetch-wait-max-time time to return.
|
-| client.scanner.log.fetch.wait-max-time | Duration | 500ms
| The maximum time to wait for enough
bytes to be available for a fetch log request from client to response.
|
-| client.scanner.io.tmpdir | String |
System.getProperty("java.io.tmpdir") + "/fluss" | Local directory that is used
by client for storing the data files (like kv snapshot, log segment files) to
read temporarily
|
-| client.scanner.remote-log.prefetch-num | Integer | 4
| The number of remote log segments to
keep in local temp file for LogScanner, which download from remote storage. The
default setting is 4.
|
-| client.remote-file.download-thread-num | Integer | 3
| The number of threads the client uses
to download remote files.
|
+| Option | Type | Default
| Description
|
+|-----------------------------------------------|------------|-------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| scan.startup.mode | Enum | full
| The scan startup mode enables you to specify
the starting point for data consumption. Fluss currently supports the following
`scan.startup.mode` options: `full` (default), earliest, latest, timestamp. See
the [Start Reading Position](engine-flink/reads.md#start-reading-position) for
more details.
|
+| scan.startup.timestamp | Long | (None)
| The timestamp to start reading the data from.
This option is only valid when `scan.startup.mode` is set to `timestamp`. The
format is 'milli-second-since-epoch' or `yyyy-MM-dd HH:mm:ss`, like
`1678883047356` or `2023-12-09 23:09:12`.
|
+| scan.partition.discovery.interval | Duration | 1min
| The time interval for the Fluss source to
discover the new partitions for partitioned table while scanning. A
non-positive value disables the partition discovery. The default value is 1
minute. Currently, since Fluss Admin#listPartitions(TablePath tablePath)
requires a large number of requests to ZooKeeper in server, this option cannot
be set too small, as a small value would cause frequent requests and increase
server load. In the future, once list partitions is optimized, the default
value of this parameter can be reduced. |
+| scan.kv.snapshot.lease.id | String | UUID
| The lease id to lease kv snapshots. If set,
the acquired kv snapshots will not be deleted until the consumer finished
consuming all the snapshots or the lease duration time is reached. If not set,
an UUID will be set.
|
+| scan.kv.snapshot.lease.duration | Duration | 1day
| The time period how long to wait before
expiring the kv snapshot lease to avoid kv snapshot blocking to delete.
|
+| client.scanner.log.check-crc | Boolean | true
| Automatically check the CRC3 of the read
records for LogScanner. This ensures no on-the-wire or on-disk corruption to
the messages occurred. This check adds some overhead, so it may be disabled in
cases seeking extreme performance.
|
+| client.scanner.log.max-poll-records | Integer | 500
| The maximum number of records returned in a
single call to poll() for LogScanner. Note that this config doesn't impact the
underlying fetching behavior. The Scanner will cache the records from each
fetch request and returns them incrementally from each poll.
|
+| client.scanner.log.fetch.max-bytes | MemorySize | 16mb
| The maximum amount of data the server should
return for a fetch request from client. Records are fetched in batches, and if
the first record batch in the first non-empty bucket of the fetch is larger
than this value, the record batch will still be returned to ensure that the
fetch can make progress. As such, this is not a absolute maximum.
|
+| client.scanner.log.fetch.max-bytes-for-bucket | MemorySize | 1mb
| The maximum amount of data the server should
return for a table bucket in fetch request fom client. Records are fetched in
batches, and the max bytes size is config by this option.
|
+| client.scanner.log.fetch.min-bytes | MemorySize | 1b
| The minimum bytes expected for each fetch log
request from client to response. If not enough bytes, wait up to
client.scanner.log.fetch-wait-max-time time to return.
|
+| client.scanner.log.fetch.wait-max-time | Duration | 500ms
| The maximum time to wait for enough bytes to
be available for a fetch log request from client to response.
|
+| client.scanner.io.tmpdir | String |
System.getProperty("java.io.tmpdir") + "/fluss" | Local directory that is used
by client for storing the data files (like kv snapshot, log segment files) to
read temporarily
|
+| client.scanner.remote-log.prefetch-num | Integer | 4
| The number of remote log segments to keep in
local temp file for LogScanner, which download from remote storage. The default
setting is 4.
|
Review Comment:
This is format by idea check style.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
