shekhars-li opened a new pull request #1572: URL: https://github.com/apache/samza/pull/1572
Symptoms: Containers some times taking a longer than usual to restore checkpoint. Cause: RocksDB stalls write operations in order to perform compaction if there are large number of level 0 SST files. The default thread count of RocksDB compaction thread is 1. We saw that some jobs were blocked on container restore with RocksDb taking long time to complete compaction. Similar issue and some background [here](https://github.com/facebook/rocksdb/issues/3717). Fix: We set the `rocksdb.compaction.max.background.compactions` config for Samza jobs to 4. Additional changes: Added a warning instead of an exception on first backup when Checkpoint v2 is enabled for a job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
