[ 
https://issues.apache.org/jira/browse/FLINK-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-19303:
-----------------------------------
    Labels: stale-assigned  (was: )

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 14, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it. If the "warning_label" label is not removed in 7 days, the 
issue will be automatically unassigned.


> Disable WAL in RocksDB recovery
> -------------------------------
>
>                 Key: FLINK-19303
>                 URL: https://issues.apache.org/jira/browse/FLINK-19303
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Juha Mynttinen
>            Assignee: Juha Mynttinen
>            Priority: Major
>              Labels: stale-assigned
>
> During recovery of {{RocksDBStateBackend}} the recovery mechanism puts the 
> key value pairs to local RocksDB instance(s). To speed up the process, the 
> recovery process uses RocskDB write batch mechanism. [RocksDB 
> WAL|https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log]  is enabled 
> during this process.
> During normal operations, i.e. when the state backend has been recovered and 
> the Flink application is running (on RocksDB state backend) WAL is disabled.
> The recovery process doesn't need WAL. In fact the recovery should be much 
> faster without WAL. Thus, WAL should be disabled in the recovery process.
> AFAIK the last thing that was done with WAL during recovery was an attempt to 
> remove it. Later that removal was removed because it causes stability issues 
> (https://issues.apache.org/jira/browse/FLINK-8922).
> Unfortunately the root cause why disabling WAL causes segfault during 
> recovery is unknown. After all, WAL is not used during normal operations.
> Potential explanation is some kind of bug in RocksDB write batch when using 
> WAL. It is possible later RocksDB versions have fixes / workarounds for the 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to