[jira] [Commented] (FLINK-19303) Disable WAL in RocksDB recovery

Flink Jira Bot (Jira) Fri, 16 Apr 2021 03:54:45 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322959#comment-17322959
 ]


Flink Jira Bot commented on FLINK-19303:
----------------------------------------

This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Disable WAL in RocksDB recovery
> -------------------------------
>
>                 Key: FLINK-19303
>                 URL: https://issues.apache.org/jira/browse/FLINK-19303
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Juha Mynttinen
>            Assignee: Juha Mynttinen
>            Priority: Major
>              Labels: stale-assigned
>
> During recovery of {{RocksDBStateBackend}} the recovery mechanism puts the 
> key value pairs to local RocksDB instance(s). To speed up the process, the 
> recovery process uses RocskDB write batch mechanism. [RocksDB 
> WAL|https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log]  is enabled 
> during this process.
> During normal operations, i.e. when the state backend has been recovered and 
> the Flink application is running (on RocksDB state backend) WAL is disabled.
> The recovery process doesn't need WAL. In fact the recovery should be much 
> faster without WAL. Thus, WAL should be disabled in the recovery process.
> AFAIK the last thing that was done with WAL during recovery was an attempt to 
> remove it. Later that removal was removed because it causes stability issues 
> (https://issues.apache.org/jira/browse/FLINK-8922).
> Unfortunately the root cause why disabling WAL causes segfault during 
> recovery is unknown. After all, WAL is not used during normal operations.
> Potential explanation is some kind of bug in RocksDB write batch when using 
> WAL. It is possible later RocksDB versions have fixes / workarounds for the 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19303) Disable WAL in RocksDB recovery

Reply via email to