dchristle commented on code in PR #21729: URL: https://github.com/apache/flink/pull/21729#discussion_r1081578012
########## flink-python/pyflink/datastream/state_backend.py: ########## @@ -1000,21 +1000,24 @@ class PredefinedOptions(Enum): The following options are set: + - BlockBasedTableConfig.setBlockCacheSize(256 MBytes) + - BlockBasedTableConfig.setBlockSize(128 KBytes) + - BlockBasedTableConfig.setFilterPolicy(BloomFilter( + `BLOOM_FILTER_BITS_PER_KEY`, + `BLOOM_FILTER_BLOCK_BASED_MODE`) - setLevelCompactionDynamicLevelBytes(true) - - setTargetFileSizeBase(256 MBytes) + - setMaxBackgroundJobs(4) - setMaxBytesForLevelBase(1 GByte) - - setWriteBufferSize(64 MBytes) - - setIncreaseParallelism(4) - - setMinWriteBufferNumberToMerge(3) - - setMaxWriteBufferNumber(4) - - setUseFsync(false) - setMaxOpenFiles(-1) - - BlockBasedTableConfig.setBlockCacheSize(256 MBytes) - - BlockBasedTableConfigsetBlockSize(128 KBytes) + - setMaxWriteBufferNumber(4) Review Comment: Originally, I wanted to remove just the erroneous `disableDataSync`, but in doing this, I noticed other errors like the missing Bloom filter or that the Python documentation implies `setUseFSync(false)` is set while the Java docs don't (not technically a mistake, but it's confusing & not easy to see from the code). Sorting isn't strictly necessary, but it _was_ part of how I found the Bloom filter setting was undocumented & manually verified each group of options now matches its description. It lets readers quickly determine whether a particular setting they have in mind is changed or not. Since it is easier to see at a glance when a new config is added, or an old one removed, future maintainers have a lower probability of introducing bugs or making the documentation & code diverge like it did for the Bloom filter. It isn't strictly necessary to sort in this PR, either, but the overhead of creating a separate PR/separate JIRA issue seems too high for such a simple change. I can add these other non-idealities that I fixed into the JIRA issue description. Would that be an OK path forward, rather than deferring these minor polishes into separate PRs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org