dchristle commented on code in PR #21729:
URL: https://github.com/apache/flink/pull/21729#discussion_r1081578012


##########
flink-python/pyflink/datastream/state_backend.py:
##########
@@ -1000,21 +1000,24 @@ class PredefinedOptions(Enum):
 
     The following options are set:
 
+    - BlockBasedTableConfig.setBlockCacheSize(256 MBytes)
+    - BlockBasedTableConfig.setBlockSize(128 KBytes)
+    - BlockBasedTableConfig.setFilterPolicy(BloomFilter(
+        `BLOOM_FILTER_BITS_PER_KEY`,
+        `BLOOM_FILTER_BLOCK_BASED_MODE`)
     - setLevelCompactionDynamicLevelBytes(true)
-    - setTargetFileSizeBase(256 MBytes)
+    - setMaxBackgroundJobs(4)
     - setMaxBytesForLevelBase(1 GByte)
-    - setWriteBufferSize(64 MBytes)
-    - setIncreaseParallelism(4)
-    - setMinWriteBufferNumberToMerge(3)
-    - setMaxWriteBufferNumber(4)
-    - setUseFsync(false)
     - setMaxOpenFiles(-1)
-    - BlockBasedTableConfig.setBlockCacheSize(256 MBytes)
-    - BlockBasedTableConfigsetBlockSize(128 KBytes)
+    - setMaxWriteBufferNumber(4)

Review Comment:
   Originally, I wanted to remove just the erroneous `disableDataSync`, but in 
doing this, I noticed other errors like the missing Bloom filter or that the 
Python documentation implies `setUseFSync(false)` is set while the Java docs 
don't (not technically a mistake, but it's confusing & not easy to see from the 
code). 
   
   Sorting isn't strictly necessary, but it _was_ part of how I found the Bloom 
filter setting was undocumented & manually verified each group of options now 
matches its description. It lets readers quickly determine whether a particular 
setting they have in mind is changed or not. Since it is easier to see at a 
glance when a new config is added, or an old one removed, future maintainers 
have a lower probability of introducing bugs or making the documentation & code 
diverge like it did for the Bloom filter.
   
   It isn't strictly necessary to sort in this PR, either, but the overhead of 
creating a separate PR/separate JIRA issue seems too high for such a simple 
change. 
   
   I can add these other non-idealities that I fixed into the JIRA issue 
description. Would that be an OK path forward, rather than deferring these 
minor polishes into separate PRs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to