[ 
https://issues.apache.org/jira/browse/SAMZA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Maes updated SAMZA-957:
----------------------------
    Description: 
We had an issue where RocksDB performance severely degraded for 23 hours and 
then resolved itself. To troubleshoot the issue I gathered some samples of the 
compaction stats from the RocksDB log and engaged with the RocksDB team via an 
existing, related issue: 
https://github.com/facebook/rocksdb/issues/696#issuecomment-222549220

They pointed out that the job was flushing excessively:
{quote}
If you overload RocksDB with work (i.e. do bunch of writes really fast, or in 
your case, bunch of small flushes), it will begin stalling writes while the 
compactions (deferred work) completes. An interesting thing with RocksDB and 
LSM architecture is that the more behind you are on compactions, the more 
expensive the compactions are (due to increased write amplifications and 
single-threadness of L0->L1 compaction). So our write stalls have to be tuned 
exactly right for RocksDB to behave well with extremely high write rate.
{quote}

Looking through our commit history I see that SAMZA-812 and SAMZA-873 have both 
intended to address this issue, by reducing the amount of flushes in 
CachedStore. 

To be fair, the job in question did not have the SAMZA-873 patch, but I see 
even more room for improvement. Namely, CachedStore should *never* flush the 
underlying store unless its flush() was called. It can purge its dirty items to 
trade off performance for correctness, but flushing is excessive. So, this 
patch will remove the flushes from the all() and range() methods, simplify the 
LRU logic, and add a good unit test to verify and explain the proper LRU 
behavior.

  was:
We had an issue where RocksDB performance severely degraded for 23 hours and 
then resolved itself. To troubleshoot the issue I gathered some samples of the 
compaction stats from the RocksDB log and engaged with the RocksDB team via an 
existing, related issue: 
https://github.com/facebook/rocksdb/issues/696#issuecomment-222758867

They pointed out that the job was flushing excessively:
{quote}
If you overload RocksDB with work (i.e. do bunch of writes really fast, or in 
your case, bunch of small flushes), it will begin stalling writes while the 
compactions (deferred work) completes. An interesting thing with RocksDB and 
LSM architecture is that the more behind you are on compactions, the more 
expensive the compactions are (due to increased write amplifications and 
single-threadness of L0->L1 compaction). So our write stalls have to be tuned 
exactly right for RocksDB to behave well with extremely high write rate.
{quote}

Looking through our commit history I see that SAMZA-812 and SAMZA-873 have both 
intended to address this issue, by reducing the amount of flushes in 
CachedStore. 

To be fair, the job in question did not have the SAMZA-873 patch, but I see 
even more room for improvement. Namely, CachedStore should *never* flush the 
underlying store unless its flush() was called. It can purge its dirty items to 
trade off performance for correctness, but flushing is excessive. So, this 
patch will remove the flushes from the all() and range() methods, simplify the 
LRU logic, and add a good unit test to verify and explain the proper LRU 
behavior.


> Avoid unnecessary KV Store flushes (part 3)
> -------------------------------------------
>
>                 Key: SAMZA-957
>                 URL: https://issues.apache.org/jira/browse/SAMZA-957
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jake Maes
>            Assignee: Jake Maes
>         Attachments: SAMZA-957_1.patch
>
>
> We had an issue where RocksDB performance severely degraded for 23 hours and 
> then resolved itself. To troubleshoot the issue I gathered some samples of 
> the compaction stats from the RocksDB log and engaged with the RocksDB team 
> via an existing, related issue: 
> https://github.com/facebook/rocksdb/issues/696#issuecomment-222549220
> They pointed out that the job was flushing excessively:
> {quote}
> If you overload RocksDB with work (i.e. do bunch of writes really fast, or in 
> your case, bunch of small flushes), it will begin stalling writes while the 
> compactions (deferred work) completes. An interesting thing with RocksDB and 
> LSM architecture is that the more behind you are on compactions, the more 
> expensive the compactions are (due to increased write amplifications and 
> single-threadness of L0->L1 compaction). So our write stalls have to be tuned 
> exactly right for RocksDB to behave well with extremely high write rate.
> {quote}
> Looking through our commit history I see that SAMZA-812 and SAMZA-873 have 
> both intended to address this issue, by reducing the amount of flushes in 
> CachedStore. 
> To be fair, the job in question did not have the SAMZA-873 patch, but I see 
> even more room for improvement. Namely, CachedStore should *never* flush the 
> underlying store unless its flush() was called. It can purge its dirty items 
> to trade off performance for correctness, but flushing is excessive. So, this 
> patch will remove the flushes from the all() and range() methods, simplify 
> the LRU logic, and add a good unit test to verify and explain the proper LRU 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to