Re: [PR] [FLINK-36439][docs] Documents for Disaggregate State and new State APIs [flink]

via GitHub Thu, 06 Feb 2025 07:53:21 -0800


davidradl commented on code in PR #26107:
URL: https://github.com/apache/flink/pull/26107#discussion_r1944973204



##########
docs/layouts/shortcodes/generated/forst_configurable_configuration.html:
##########
@@ -0,0 +1,168 @@
+<table class="configuration table table-bordered">
+    <thead>
+        <tr>
+            <th class="text-left" style="width: 20%">Key</th>
+            <th class="text-left" style="width: 15%">Default</th>
+            <th class="text-left" style="width: 10%">Type</th>
+            <th class="text-left" style="width: 55%">Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td><h5>state.backend.forst.block.blocksize</h5></td>
+            <td style="word-wrap: break-word;">4 kb</td>
+            <td>MemorySize</td>
+            <td>The approximate size (in bytes) of user data packed per block. 
The default blocksize is '4KB'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.block.cache-size</h5></td>
+            <td style="word-wrap: break-word;">8 mb</td>
+            <td>MemorySize</td>
+            <td>The amount of the cache for data blocks in ForSt. The default 
block-cache size is '8MB'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.block.metadata-blocksize</h5></td>
+            <td style="word-wrap: break-word;">4 kb</td>
+            <td>MemorySize</td>
+            <td>Approximate size of partitioned metadata packed per block. 
Currently applied to indexes block when partitioned index/filters option is 
enabled. The default blocksize is '4KB'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.bloom-filter.bits-per-key</h5></td>
+            <td style="word-wrap: break-word;">10.0</td>
+            <td>Double</td>
+            <td>Bits per key that bloom filter will use, this only take effect 
when bloom filter is used. The default value is 10.0.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.bloom-filter.block-based-mode</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>If true, ForSt will use block-based filter instead of full 
filter, this only take effect when bloom filter is used. The default value is 
'false'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.filter.periodic-compaction-time</h5></td>
+            <td style="word-wrap: break-word;">30 d</td>
+            <td>Duration</td>
+            <td>Periodic compaction could speed up expired state entries 
cleanup, especially for state entries rarely accessed. Files older than this 
value will be picked up for compaction, and re-written to the same level as 
they were before. It makes sure a file goes through compaction filters 
periodically. 0 means turning off periodic compaction.The default value is 
'30days'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.filter.query-time-after-num-entries</h5></td>
+            <td style="word-wrap: break-word;">1000</td>
+            <td>Long</td>
+            <td>Number of state entries to process by compaction filter before 
updating current timestamp. Updating the timestamp more often can improve 
cleanup speed, but it decreases compaction performance because it uses JNI 
calls from native code.The default value is '1000L'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.level.max-size-level-base</h5></td>
+            <td style="word-wrap: break-word;">256 mb</td>
+            <td>MemorySize</td>
+            <td>The upper-bound of the total size of level base files in 
bytes. The default value is '256MB'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.level.target-file-size-base</h5></td>
+            <td style="word-wrap: break-word;">64 mb</td>
+            <td>MemorySize</td>
+            <td>The target file size for compaction, which determines a 
level-1 file size. The default value is '64MB'.</td>
+        </tr>
+        <tr>
+            
<td><h5>state.backend.forst.compaction.level.use-dynamic-size</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>If true, ForSt will pick target size of each level 
dynamically. From an empty DB, ForSt would make last level the base level, 
which means merging L0 data into the last level, until it exceeds 
max_bytes_for_level_base. And then repeat this process for second last level 
and so on. The default value is 'false'. For more information, please refer to 
<a 
href="https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#level_compaction_dynamic_level_bytes-is-true";>RocksDB's
 doc.</a></td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.compaction.style</h5></td>
+            <td style="word-wrap: break-word;">LEVEL</td>
+            <td><p>Enum</p></td>
+            <td>The specified compaction style for DB. Candidate compaction 
style is LEVEL, FIFO, UNIVERSAL or NONE, and Flink chooses 'LEVEL' as default 
style.<br /><br />Possible 
values:<ul><li>"LEVEL"</li><li>"UNIVERSAL"</li><li>"FIFO"</li><li>"NONE"</li></ul></td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.compression.per.level</h5></td>
+            <td style="word-wrap: break-word;">SNAPPY_COMPRESSION</td>
+            <td><p>List&lt;Enum&gt;</p></td>
+            <td>A semicolon-separated list of Compression Type. Different 
levels can have different compression policies. In many cases, lower levels use 
fast compression algorithms, while higher levels with more data use slower but 
more effective compression algorithms. The N th element in the List corresponds 
to the compression type of the level N-1When <code 
class="highlighter-rouge">state.backend.forst.compaction.level.use-dynamic-size</code>
 is true, compression_per_level[0] still determines L0, but other elements are 
based on the base level and may not match the level seen in the info log<br 
/>Note: If the List size is smaller than the level number, the undefined lower 
level uses the last Compression Type in the List<br />Some commonly used 
compression algorithms for candidates include <code 
class="highlighter-rouge">NO_COMPRESSION</code> ,<code 
class="highlighter-rouge">SNAPPY_COMPRESSION</code> and <code 
class="highlighter-rouge">LZ4_COMPRESSION</code><br />The default value
  is <code class="highlighter-rouge">SNAPPY_COMPRESSION</code>, which means 
that all data uses the Snappy compression algorithm.Likewise, if set to <code 
class="highlighter-rouge">NO_COMPRESSION</code> , means that all data is not 
compressed, which will achieve faster speed but will bring some space 
amplification.In addition, if we need to consider both spatial amplification 
and performance, we can also set it to '<code 
class="highlighter-rouge">NO_COMPRESSION</code>;<code 
class="highlighter-rouge">NO_COMPRESSION</code>;<code 
class="highlighter-rouge">LZ4_COMPRESSION</code>', which means that L0 and L1 
data will not be compressed, and other data will be compressed using LZ4.<br 
/><br />Possible 
values:<ul><li>"NO_COMPRESSION"</li><li>"SNAPPY_COMPRESSION"</li><li>"ZLIB_COMPRESSION"</li><li>"BZLIB2_COMPRESSION"</li><li>"LZ4_COMPRESSION"</li><li>"LZ4HC_COMPRESSION"</li><li>"XPRESS_COMPRESSION"</li><li>"ZSTD_COMPRESSION"</li><li>"DISABLE_COMPRESSION_OPTION"</li></ul></td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.files.open</h5></td>
+            <td style="word-wrap: break-word;">-1</td>
+            <td>Integer</td>
+            <td>The maximum number of open files (per stateful operator) that 
can be used by the DB, '-1' means no limit. The default value is '-1'.</td>
+        </tr>
+        <tr>
+            <td><h5>state.backend.forst.log.dir</h5></td>
+            <td style="word-wrap: break-word;">(none)</td>
+            <td>String</td>
+            <td>The directory for ForSt's information logging files. If empty 
(Flink default setting), log files will be in the same directory as the Flink 
log. If non-empty, this directory will be used and the data directory's 
absolute path will be used as the prefix of the log file name. If setting this 
option as a non-existing location, e.g '/dev/null', ForSt will then create the 
log under its own database folder as before.</td>

Review Comment:
   how would this work in Kubenetes when log forwarding could be in place. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-36439][docs] Documents for Disaggregate State and new State APIs [flink]

Reply via email to