[jira] [Commented] (FLINK-8922) Revert FLINK-8859 because it causes segfaults in testing

Sihua Zhou (JIRA) Tue, 13 Mar 2018 04:19:16 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396836#comment-16396836
 ]


Sihua Zhou commented on FLINK-8922:
-----------------------------------

After having some tried (bumping {{rocksdbjni}} to higher version, using 
{{WriteOptions}} via {{WriteBatch}}, rewriting the recovery code to exclude 
possible causes), I still can't figure out why deactivate WAL can lead to 
segfault, maybe it's really a bug in RocksDB (but I can't find any evidence 
either)...

Next step I want to have a deeper look at RocksDB itself, and here I want to 
add some comments about the cost problem that [~StephanEwen] mentioned.  I 
think even though we can't deactivate WAL on restore (the worst case ), we 
still could get a much better performance then currently via using 
{{WriteBatch}}. Fortunately, {{WriteBatch}} is not very sensitive to WAL. Even 
when WAL is enabled, it's still get a better performance than using {{put()}} 
but deactivate WAL. Here I paste some statistics (base on my mac).
{code}
--> put with disableWAL=true VS put with disableWAL=false <--
number:1000 put cost:6 ms
number:1000 put cost:17 ms
number:10000 put cost:48 ms
number:10000 put cost:106 ms
number:100000 put cost:857 ms
number:100000 put cost:1871 ms
number:1000000 put cost:3654 ms
number:1000000 put cost:9416 ms
--> put with disableWAL=true VS write batch with disableWAL=false <--
number:1000 put cost:4 ms
number:1000 write batch cost:5 ms
number:10000 put cost:41 ms
number:10000 write batch cost:25 ms
number:100000 put cost:372 ms
number:100000 write batch cost:262 ms
number:1000000 put cost:3869 ms
number:1000000 write batch cost:2751 ms
--> write batch with disableWAL=true VS write batch disableWAL = true <--
number:1000 write batch cost:3 ms
number:1000 write batch cost:4 ms
number:10000 write batch cost:21 ms
number:10000 write batch cost:27 ms
number:100000 write batch cost:243 ms
number:100000 write batch cost:278 ms
number:1000000 write batch cost:2495 ms
number:1000000 write batch cost:2818 ms
{code}
There is already a [JIRA|https://issues.apache.org/jira/browse/FLINK-8845] and 
a [PR|https://github.com/apache/flink/pull/5650] for 1.6(I'm not ask to merge 
it right now) that's related to this. Will feed back if I have any advance. ;)




> Revert FLINK-8859 because it causes segfaults in testing
> --------------------------------------------------------
>
>                 Key: FLINK-8922
>                 URL: https://issues.apache.org/jira/browse/FLINK-8922
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Major
>             Fix For: 1.5.0
>
>
> We need to revert FLINK-8859 because it causes problems with RocksDB that 
> make our automated tests fail on Travis. The change looks actually good and 
> it is currently unclear why this can introduce such a problem. This might 
> also be a Rocks in RocksDB. Nevertheless, for the sake of a proper release 
> testing, we should revert the change for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8922) Revert FLINK-8859 because it causes segfaults in testing

Reply via email to