[ 
https://issues.apache.org/jira/browse/FLINK-22752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374995#comment-17374995
 ] 

Stephan Ewen commented on FLINK-22752:
--------------------------------------

Adding comment from [~yunta] from duplicate issue:

> If using rocksDB state backend by default, and plan to enable partitioned 
> index & filters by default, we could also enable bloom filter by default to 
> improve the performance.

I think this makes sense, let's add this.

> Add robust default state configuration to StateFun
> --------------------------------------------------
>
>                 Key: FLINK-22752
>                 URL: https://issues.apache.org/jira/browse/FLINK-22752
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Stephan Ewen
>            Priority: Major
>
> We aim to reduce the state configuration complexity by applying a default 
> configuration with robust settings, based on lessons learned in Flink.
> *(1) Always use RocksDB.*
> _That is already the case._
> We keep this for now, as long as the only other alternative are backends with 
> Objects on the heap, which are tricky in terms of predictable JVM 
> performance. RocksDB has a significant performance cost, but more robust 
> behavior.
> *(2) Activate local recovery by default.*
> That makes recovery cheao for soft tasks failures and gracefully cancelled 
> tasks.
> We need to set these options:
>   - {{state.backend.local-recovery: true}}
>   - {{taskmanager.state.local.root-dirs: <dir>}} - some local directory that 
> will not possibly be wiped by the OS periodically, so typically some local 
> directory that is not {{/tmp}}, for example {{/local/state/recovery}}.
>   - {{state.backend.rocksdb.localdir: <dir>}} - a directory on the same FS / 
> device as above, so that one can create hard links between them (required for 
> RocksDB local checkpoints), for example {{/local/state/rocksdb}}.
> Flink will most likely adopt this as a default setting as well in the future.
> It still makes sense to pre-configer a different RocksDB working directory 
> than {{/tmp}}.
> *(3) Activate partitioned indexes by default.*
> This may cost minimal performance in some cases, but can avoid massive 
> performance regression in cases where the index blocks no longer fit into the 
> memory cache (may happen more frequently when there are too many 
> ColumnFamilies = states).
> Set {{state.backend.rocksdb.memory.partitioned-index-filters: true}}.
> See FLINK-20496 for details.
> *(4) Increase number of transfer threads by default.*
> This speeds up state recovery in many cases. The default value in Flink is a 
> bit conservative, to avoid spamming DFS (like HDFS) by default. The more 
> cloud-centric StateFun setups should be safe to use higher default value.
> Set {{state.backend.rocksdb.checkpoint.transfer.thread.num: 8}}.
> *(5) Increase RocksDB compaction threads by default.*
> The number of RocksDB compaction threads is frequently a bottleneck.
> Increasing it costs virtually nothing and mitigates that bottleneck in most 
> cases.
> {{state.backend.rocksdb.thread.num: 4}}
> _(this value is chosen under the assumption that there is only one slot per 
> TM in StateFun)._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to