shekhars-li opened a new pull request #1572:
URL: https://github.com/apache/samza/pull/1572


   Symptoms: Containers some times taking a longer than usual to restore 
checkpoint. 
   
   Cause: RocksDB stalls write operations in order to perform compaction if 
there are large number of level 0 SST files. The default thread count of 
RocksDB compaction thread is 1. We saw that some jobs were blocked on container 
restore with RocksDb taking long time to complete compaction. Similar issue and 
some background [here](https://github.com/facebook/rocksdb/issues/3717).
   
   Fix: We set the `rocksdb.compaction.max.background.compactions` config for 
Samza jobs to 4. 
   
   Additional changes: Added a warning instead of an exception on first backup 
when Checkpoint v2 is enabled for a job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to