[ https://issues.apache.org/jira/browse/FLINK-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-22749: ----------------------------------- Labels: stale-major (was: ) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Major but is unassigned and neither itself nor its Sub-Tasks have been updated for 30 days. I have gone ahead and added a "stale-major" to the issue". If this ticket is a Major, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized. > Apply a robust default State Backend Configuration > -------------------------------------------------- > > Key: FLINK-22749 > URL: https://issues.apache.org/jira/browse/FLINK-22749 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions > Reporter: Stephan Ewen > Priority: Major > Labels: stale-major > > We should update the default state backend configuration with default > settings that reflect lessons-learned about robust setups. > (1) Always use the RocksDB State Backend. That is already the case. > (2) Active Partitioned Index filters by default. This may cost some overhead > in specific cases, but helps with massive performance regressions when we > have too many ColumnFamilies (too many states) such that the cache can no > longer hold all index files. > We need to add {{state.backend.rocksdb.memory.partitioned-index-filters: > true}} to the config. > See FLINK-20496 for details. > There is a chance that Flink makes this the default in the future as well, > then we could remove it again from the StateFun setup. > (3) Activate local recovery by default. > That should speed up the recovery of all non-hard-crashed TMs by a lot. > We need to configure > - {{state.backend.local-recovery: true}} > - {{taskmanager.state.local.root-dirs}} to some non-temp directory > For this to work reliably, we need a local directory that is not periodically > wiped by the OS, so we should not rely on the default ({{/tmp}} directory, > but set up a dedicated non-temp state directory. > Flink will probably make this the default in the future, but having a > non-{{/tmp}} directory for the RocksDB and local snapshots makes still a lot > of sense. > (4) Increase state transfer threads by default, to speed up state restores. > Add to the config: {{state.backend.rocksdb.checkpoint.transfer.thread.num: 8}} -- This message was sent by Atlassian Jira (v8.3.4#803005)