Re: [PR] [kv] Support kv snapshot lease [fluss]

via GitHub Sun, 18 Jan 2026 05:05:39 -0800


swuferhong commented on code in PR #2179:
URL: https://github.com/apache/fluss/pull/2179#discussion_r2702398182



##########
website/docs/engine-flink/options.md:
##########
@@ -91,20 +91,22 @@ See more details about [ALTER TABLE ... 
SET](engine-flink/ddl.md#set-properties)
 
 ## Read Options
 
-| Option                                              | Type       | Default   
                                      | Description                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
           |
-|-----------------------------------------------------|------------|-------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| scan.startup.mode                                   | Enum       | full      
                                      | The scan startup mode enables you to 
specify the starting point for data consumption. Fluss currently supports the 
following `scan.startup.mode` options: `full` (default), earliest, latest, 
timestamp. See the [Start Reading 
Position](engine-flink/reads.md#start-reading-position) for more details.       
                                                                                
                                                                                
                                                                   |
-| scan.startup.timestamp                              | Long       | (None)    
                                      | The timestamp to start reading the data 
from. This option is only valid when `scan.startup.mode` is set to `timestamp`. 
The format is 'milli-second-since-epoch' or `yyyy-MM-dd HH:mm:ss`, like 
`1678883047356` or `2023-12-09 23:09:12`.                                       
                                                                                
                                                                                
                                                                                
                   |
-| scan.partition.discovery.interval                   | Duration   | 1min      
                                      | The time interval for the Fluss source 
to discover the new partitions for partitioned table while scanning. A 
non-positive value disables the partition discovery. The default value is 1 
minute. Currently, since Fluss Admin#listPartitions(TablePath tablePath) 
requires a large number of requests to ZooKeeper in server, this option cannot 
be set too small, as a small value would cause frequent requests and increase 
server load. In the future, once list partitions is optimized, the default 
value of this parameter can be reduced. |
-| client.scanner.log.check-crc                        | Boolean    | true      
                                      | Automatically check the CRC3 of the 
read records for LogScanner. This ensures no on-the-wire or on-disk corruption 
to the messages occurred. This check adds some overhead, so it may be disabled 
in cases seeking extreme performance.                                           
                                                                                
                                                                                
                                                                                
                 |
-| client.scanner.log.max-poll-records                 | Integer    | 500       
                                      | The maximum number of records returned 
in a single call to poll() for LogScanner. Note that this config doesn't impact 
the underlying fetching behavior. The Scanner will cache the records from each 
fetch request and returns them incrementally from each poll.                    
                                                                                
                                                                                
                                                                                
             |
-| client.scanner.log.fetch.max-bytes                  | MemorySize | 16mb      
                                      | The maximum amount of data the server 
should return for a fetch request from client. Records are fetched in batches, 
and if the first record batch in the first non-empty bucket of the fetch is 
larger than this value, the record batch will still be returned to ensure that 
the fetch can make progress. As such, this is not a absolute maximum.           
                                                                                
                                                                                
                   |
-| client.scanner.log.fetch.max-bytes-for-bucket       | MemorySize | 1mb       
                                      | The maximum amount of data the server 
should return for a table bucket in fetch request fom client. Records are 
fetched in batches, and the max bytes size is config by this option.            
                                                                                
                                                                                
                                                                                
                                                                                
                   |
-| client.scanner.log.fetch.min-bytes                  | MemorySize | 1b        
                                      | The minimum bytes expected for each 
fetch log request from client to response. If not enough bytes, wait up to 
client.scanner.log.fetch-wait-max-time time to return.                          
                                                                                
                                                                                
                                                                                
                                                                                
                    |
-| client.scanner.log.fetch.wait-max-time              | Duration   | 500ms     
                                      | The maximum time to wait for enough 
bytes to be available for a fetch log request from client to response.          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
               |
-| client.scanner.io.tmpdir                            | String     | 
System.getProperty("java.io.tmpdir") + "/fluss" | Local directory that is used 
by client for storing the data files (like kv snapshot, log segment files) to 
read temporarily                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                        |
-| client.scanner.remote-log.prefetch-num              | Integer    | 4         
                                      | The number of remote log segments to 
keep in local temp file for LogScanner, which download from remote storage. The 
default setting is 4.                                                           
                                                                                
                                                                                
                                                                                
                                                                                
              |
-| client.remote-file.download-thread-num              | Integer    | 3         
                                      | The number of threads the client uses 
to download remote files.                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
             |
+| Option                                        | Type       | Default         
                                | Description                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
     |
+|-----------------------------------------------|------------|-------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| scan.startup.mode                             | Enum       | full            
                                | The scan startup mode enables you to specify 
the starting point for data consumption. Fluss currently supports the following 
`scan.startup.mode` options: `full` (default), earliest, latest, timestamp. See 
the [Start Reading Position](engine-flink/reads.md#start-reading-position) for 
more details.                                                                   
                                                                                
                                                                                
       |
+| scan.startup.timestamp                        | Long       | (None)          
                                | The timestamp to start reading the data from. 
This option is only valid when `scan.startup.mode` is set to `timestamp`. The 
format is 'milli-second-since-epoch' or `yyyy-MM-dd HH:mm:ss`, like 
`1678883047356` or `2023-12-09 23:09:12`.                                       
                                                                                
                                                                                
                                                                                
                   |
+| scan.partition.discovery.interval             | Duration   | 1min            
                                | The time interval for the Fluss source to 
discover the new partitions for partitioned table while scanning. A 
non-positive value disables the partition discovery. The default value is 1 
minute. Currently, since Fluss Admin#listPartitions(TablePath tablePath) 
requires a large number of requests to ZooKeeper in server, this option cannot 
be set too small, as a small value would cause frequent requests and increase 
server load. In the future, once list partitions is optimized, the default 
value of this parameter can be reduced. |
+| scan.kv.snapshot.lease.id                     | String     | UUID            
                                | The lease id to lease kv snapshots. If set, 
the acquired kv snapshots will not be deleted until the consumer finished 
consuming all the snapshots or the lease duration time is reached. If not set, 
an UUID will be set.                                                            
                                                                                
                                                                                
                                                                                
              |
+| scan.kv.snapshot.lease.duration               | Duration   | 1day            
                                | The time period how long to wait before 
expiring the kv snapshot lease to avoid kv snapshot blocking to delete.         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
           |
+| client.scanner.log.check-crc                  | Boolean    | true            
                                | Automatically check the CRC3 of the read 
records for LogScanner. This ensures no on-the-wire or on-disk corruption to 
the messages occurred. This check adds some overhead, so it may be disabled in 
cases seeking extreme performance.                                              
                                                                                
                                                                                
                                                                                
              |
+| client.scanner.log.max-poll-records           | Integer    | 500             
                                | The maximum number of records returned in a 
single call to poll() for LogScanner. Note that this config doesn't impact the 
underlying fetching behavior. The Scanner will cache the records from each 
fetch request and returns them incrementally from each poll.                    
                                                                                
                                                                                
                                                                                
             |
+| client.scanner.log.fetch.max-bytes            | MemorySize | 16mb            
                                | The maximum amount of data the server should 
return for a fetch request from client. Records are fetched in batches, and if 
the first record batch in the first non-empty bucket of the fetch is larger 
than this value, the record batch will still be returned to ensure that the 
fetch can make progress. As such, this is not a absolute maximum.               
                                                                                
                                                                                
               |
+| client.scanner.log.fetch.max-bytes-for-bucket | MemorySize | 1mb             
                                | The maximum amount of data the server should 
return for a table bucket in fetch request fom client. Records are fetched in 
batches, and the max bytes size is config by this option.                       
                                                                                
                                                                                
                                                                                
                                                                                
        |
+| client.scanner.log.fetch.min-bytes            | MemorySize | 1b              
                                | The minimum bytes expected for each fetch log 
request from client to response. If not enough bytes, wait up to 
client.scanner.log.fetch-wait-max-time time to return.                          
                                                                                
                                                                                
                                                                                
                                                                                
                    |
+| client.scanner.log.fetch.wait-max-time        | Duration   | 500ms           
                                | The maximum time to wait for enough bytes to 
be available for a fetch log request from client to response.                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
      |
+| client.scanner.io.tmpdir                      | String     | 
System.getProperty("java.io.tmpdir") + "/fluss" | Local directory that is used 
by client for storing the data files (like kv snapshot, log segment files) to 
read temporarily                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                        |
+| client.scanner.remote-log.prefetch-num        | Integer    | 4               
                                | The number of remote log segments to keep in 
local temp file for LogScanner, which download from remote storage. The default 
setting is 4.                                                                   
                                                                                
                                                                                
                                                                                
                                                                                
      |

Review Comment:
   This is format by idea check style.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [kv] Support kv snapshot lease [fluss]

Reply via email to