[
https://issues.apache.org/jira/browse/SAMZA-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299608#comment-14299608
]
Jay Kreps commented on SAMZA-543:
---------------------------------
Oh, wow, yeah, well that may explain the performance. The batch write was about
a 5x perf improvement for write intensive stuff in leveldb. Did you guys perf
test the RocksDB stuff when it was added?
But actually what I was describing wasn't batch write but rather disabling
compaction. They have some way to do "batched compaction". If you check out
their benchmark on batch load perf they describe this. Basically instead of
doing a ton of compaction online during the bootstrap load, you just read all
the data (which is mostly deduplicated anyway) and then when the bootstrap is
complete they do a global sort. This should be a lot faster because otherwise
you potentially compact each thing many times for no reason.
> Disable WAL in RocksDB KV store
> -------------------------------
>
> Key: SAMZA-543
> URL: https://issues.apache.org/jira/browse/SAMZA-543
> Project: Samza
> Issue Type: Bug
> Components: kv
> Affects Versions: 0.9.0
> Reporter: Chris Riccomini
> Fix For: 0.9.0
>
>
> RocksDB uses a write-ahead log by default. This is unnecessary in Samza,
> since we have full durability from a state store's changelog topic. We should
> [disable the
> WAL|https://github.com/facebook/rocksdb/wiki/Basic-Operations#asynchronous-writes]
> in the RocksDB KV store.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)