Re: [PR] [FLINK-36439][docs] Documents for Disaggregate State and new State APIs [flink]

via GitHub Thu, 06 Feb 2025 08:01:34 -0800


davidradl commented on code in PR #26107:
URL: https://github.com/apache/flink/pull/26107#discussion_r1944989494



##########
docs/layouts/shortcodes/generated/forst_configurable_configuration.html:
##########
@@ -0,0 +1,168 @@
+<table class="configuration table table-bordered">
+    <thead>
+        <tr>
+            <th class="text-left" style="width: 20%">Key</th>
+            <th class="text-left" style="width: 15%">Default</th>
+            <th class="text-left" style="width: 10%">Type</th>
+            <th class="text-left" style="width: 55%">Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td><h5>state.backend.forst.block.blocksize</h5></td>
+            <td style="word-wrap: break-word;">4 kb</td>
+            <td>MemorySize</td>
+            <td>The approximate size (in bytes) of user data packed per block. 
The default blocksize is '4KB'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.block.cache-size</h5></td>
+            <td style="word-wrap: break-word;">8 mb</td>
+            <td>MemorySize</td>
+            <td>The amount of the cache for data blocks in ForSt. The default 
block-cache size is '8MB'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.block.metadata-blocksize</h5></td>
+            <td style="word-wrap: break-word;">4 kb</td>
+            <td>MemorySize</td>
+            <td>Approximate size of partitioned metadata packed per block. 
Currently applied to indexes block when partitioned index/filters option is 
enabled. The default blocksize is '4KB'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.bloom-filter.bits-per-key</h5></td>
+            <td style="word-wrap: break-word;">10.0</td>
+            <td>Double</td>
+            <td>Bits per key that bloom filter will use, this only take effect 
when bloom filter is used. The default value is 10.0.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.bloom-filter.block-based-mode</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>If true, ForSt will use block-based filter instead of full 
filter, this only take effect when bloom filter is used. The default value is 
'false'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.filter.periodic-compaction-time</h5></td>
+            <td style="word-wrap: break-word;">30 d</td>
+            <td>Duration</td>
+            <td>Periodic compaction could speed up expired state entries 
cleanup, especially for state entries rarely accessed. Files older than this 
value will be picked up for compaction, and re-written to the same level as 
they were before. It makes sure a file goes through compaction filters 
periodically. 0 means turning off periodic compaction.The default value is 
'30days'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.filter.query-time-after-num-entries</h5></td>
+            <td style="word-wrap: break-word;">1000</td>
+            <td>Long</td>
+            <td>Number of state entries to process by compaction filter before 
updating current timestamp. Updating the timestamp more often can improve 
cleanup speed, but it decreases compaction performance because it uses JNI 
calls from native code.The default value is '1000L'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.level.max-size-level-base</h5></td>
+            <td style="word-wrap: break-word;">256 mb</td>
+            <td>MemorySize</td>
+            <td>The upper-bound of the total size of level base files in 
bytes. The default value is '256MB'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.level.target-file-size-base</h5></td>
+            <td style="word-wrap: break-word;">64 mb</td>
+            <td>MemorySize</td>
+            <td>The target file size for compaction, which determines a 
level-1 file size. The default value is '64MB'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.level.use-dynamic-size</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>If true, ForSt will pick target size of each level 
dynamically. From an empty DB, ForSt would make last level the base level, 
which means merging L0 data into the last level, until it exceeds 
max_bytes_for_level_base. And then repeat this process for second last level 
and so on. The default value is 'false'. For more information, please refer to 
<a 
href="https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#level_compaction_dynamic_level_bytes-is-true";>RocksDB's
 doc.</a></td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.compaction.style</h5></td>
+            <td style="word-wrap: break-word;">LEVEL</td>
+            <td><p>Enum</p></td>
+            <td>The specified compaction style for DB. Candidate compaction 
style is LEVEL, FIFO, UNIVERSAL or NONE, and Flink chooses 'LEVEL' as default 
style.<br /><br />Possible 
values:<ul><li>"LEVEL"</li><li>"UNIVERSAL"</li><li>"FIFO"</li><li>"NONE"</li></ul></td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.compression.per.level</h5></td>
+            <td style="word-wrap: break-word;">SNAPPY_COMPRESSION</td>
+            <td><p>List&lt;Enum&gt;</p></td>
+            <td>A semicolon-separated list of Compression Type. Different 
levels can have different compression policies. In many cases, lower levels use 
fast compression algorithms, while higher levels with more data use slower but 
more effective compression algorithms. The N th element in the List corresponds 
to the compression type of the level N-1When <code 
class="highlighter-rouge">state.backend.forst.compaction.level.use-dynamic-size</code>
 is true, compression_per_level[0] still determines L0, but other elements are 
based on the base level and may not match the level seen in the info log<br 
/>Note: If the List size is smaller than the level number, the undefined lower 
level uses the last Compression Type in the List<br />Some commonly used 
compression algorithms for candidates include <code 
class="highlighter-rouge">NO_COMPRESSION</code> ,<code 
class="highlighter-rouge">SNAPPY_COMPRESSION</code> and <code 
class="highlighter-rouge">LZ4_COMPRESSION</code><br />The default value
  is <code class="highlighter-rouge">SNAPPY_COMPRESSION</code>, which means 
that all data uses the Snappy compression algorithm.Likewise, if set to <code 
class="highlighter-rouge">NO_COMPRESSION</code> , means that all data is not 
compressed, which will achieve faster speed but will bring some space 
amplification.In addition, if we need to consider both spatial amplification 
and performance, we can also set it to '<code 
class="highlighter-rouge">NO_COMPRESSION</code>;<code 
class="highlighter-rouge">NO_COMPRESSION</code>;<code 
class="highlighter-rouge">LZ4_COMPRESSION</code>', which means that L0 and L1 
data will not be compressed, and other data will be compressed using LZ4.<br 
/><br />Possible 
values:<ul><li>"NO_COMPRESSION"</li><li>"SNAPPY_COMPRESSION"</li><li>"ZLIB_COMPRESSION"</li><li>"BZLIB2_COMPRESSION"</li><li>"LZ4_COMPRESSION"</li><li>"LZ4HC_COMPRESSION"</li><li>"XPRESS_COMPRESSION"</li><li>"ZSTD_COMPRESSION"</li><li>"DISABLE_COMPRESSION_OPTION"</li></ul></td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.files.open</h5></td>
+            <td style="word-wrap: break-word;">-1</td>
+            <td>Integer</td>
+            <td>The maximum number of open files (per stateful operator) that 
can be used by the DB, '-1' means no limit. The default value is '-1'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.log.dir</h5></td>
+            <td style="word-wrap: break-word;">(none)</td>
+            <td>String</td>
+            <td>The directory for ForSt's information logging files. If empty 
(Flink default setting), log files will be in the same directory as the Flink 
log. If non-empty, this directory will be used and the data directory's 
absolute path will be used as the prefix of the log file name. If setting this 
option as a non-existing location, e.g '/dev/null', ForSt will then create the 
log under its own database folder as before.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.log.file-num</h5></td>
+            <td style="word-wrap: break-word;">4</td>
+            <td>Integer</td>
+            <td>The maximum number of files ForSt should keep for information 
logging (Default setting: 4).</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.log.level</h5></td>
+            <td style="word-wrap: break-word;">INFO_LEVEL</td>
+            <td><p>Enum</p></td>
+            <td>The specified information logging level for ForSt. If unset, 
Flink will use <code class="highlighter-rouge">INFO_LEVEL</code>.<br />Note: 
ForSt info logs will not be written to the TaskManager logs and there is no 
rolling strategy, unless you configure <code 
class="highlighter-rouge">state.backend.forst.log.dir</code>, <code 
class="highlighter-rouge">state.backend.forst.log.max-file-size</code>, and 
<code class="highlighter-rouge">state.backend.forst.log.file-num</code> 
accordingly. Without a rolling strategy, long-running tasks may lead to 
uncontrolled disk space usage if configured with increased log levels!<br 
/>There is no need to modify the ForSt log level, unless for troubleshooting 
ForSt.<br /><br />Possible 
values:<ul><li>"DEBUG_LEVEL"</li><li>"INFO_LEVEL"</li><li>"WARN_LEVEL"</li><li>"ERROR_LEVEL"</li><li>"FATAL_LEVEL"</li><li>"HEADER_LEVEL"</li><li>"NUM_INFO_LOG_LEVELS"</li></ul></td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.log.max-file-size</h5></td>
+            <td style="word-wrap: break-word;">25 mb</td>
+            <td>MemorySize</td>
+            <td>The maximum size of ForSt's file used for information logging. 
If the log files becomes larger than this, a new file will be created. If 0, 
all logs will be written to one log file. The default maximum file size is 
'25MB'. </td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.rescaling.use-delete-files-in-range</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>If true, during rescaling, the deleteFilesInRange API will be 
invoked to clean up the useless files so that local disk space can be reclaimed 
more promptly.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.restore-overlap-fraction-threshold</h5></td>
+            <td style="word-wrap: break-word;">0.0</td>
+            <td>Double</td>
+            <td>The threshold of overlap fraction between the handle's 
key-group range and target key-group range. When restore base DB, only the 
handle which overlap fraction greater than or equal to threshold has a chance 
to be an initial handle. The default value is 0.0, there is always a handle 
will be selected for initialization. </td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.thread.num</h5></td>
+            <td style="word-wrap: break-word;">2</td>
+            <td>Integer</td>
+            <td>The maximum number of concurrent background flush and 
compaction jobs (per stateful operator). The default value is '2'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.use-bloom-filter</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>If true, every newly created SST file will contain a Bloom 
filter. It is disabled by default.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.use-ingest-db-restore-mode</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>A recovery mode that directly clips and ingests multiple DBs 
during state recovery if the keys in the SST files does not exceed the declared 
key-group range.</td>

Review Comment:
   I assume the test refers to whetehr this is true. What about when it is 
false. Also guidance on this would be good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-36439][docs] Documents for Disaggregate State and new State APIs [flink]

Reply via email to