[
https://issues.apache.org/jira/browse/FLINK-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199306#comment-17199306
]
Juha Mynttinen commented on FLINK-19303:
----------------------------------------
Sure, you can do that [~liyu].
> Disable WAL in RocksDB recovery
> -------------------------------
>
> Key: FLINK-19303
> URL: https://issues.apache.org/jira/browse/FLINK-19303
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Reporter: Juha Mynttinen
> Priority: Major
>
> During recovery of {{RocksDBStateBackend}} the recovery mechanism puts the
> key value pairs to local RocksDB instance(s). To speed up the process, the
> recovery process uses RocskDB write batch mechanism. [RocksDB
> WAL|https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log] is enabled
> during this process.
> During normal operations, i.e. when the state backend has been recovered and
> the Flink application is running (on RocksDB state backend) WAL is disabled.
> The recovery process doesn't need WAL. In fact the recovery should be much
> faster without WAL. Thus, WAL should be disabled in the recovery process.
> AFAIK the last thing that was done with WAL during recovery was an attempt to
> remove it. Later that removal was removed because it causes stability issues
> (https://issues.apache.org/jira/browse/FLINK-8922).
> Unfortunately the root cause why disabling WAL causes segfault during
> recovery is unknown. After all, WAL is not used during normal operations.
> Potential explanation is some kind of bug in RocksDB write batch when using
> WAL. It is possible later RocksDB versions have fixes / workarounds for the
> issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)