One more thing, I am aware of one older thread that might be interesting for 
you about RocksDB backend and EBS: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/checkpoint-stuck-with-rocksdb-statebackend-and-s3-filesystem-td18679.html
 
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/checkpoint-stuck-with-rocksdb-statebackend-and-s3-filesystem-td18679.html>

> Am 25.05.2018 um 09:59 schrieb Stefan Richter <s.rich...@data-artisans.com>:
> 
> Hi,
> 
> if the problem is seemingly from reads, I think incremental checkpoints are 
> less likely to cause the problem. What Flink version are you using? Since you 
> mentioned the use of map state, what comes to my mind as a potential cause is 
> described in this issue https://issues.apache.org/jira/browse/FLINK-8639 
> <https://issues.apache.org/jira/browse/FLINK-8639> . This was improved 
> recently. Does the problem also exist for jobs without map state?
> 
> Best,
> Stefan
> 
>> Am 24.05.2018 um 20:25 schrieb Stephan Ewen <se...@apache.org 
>> <mailto:se...@apache.org>>:
>> 
>> One thing that you can always to is disable fsync, because Flink does not 
>> rely on RocksDBs fsync for persistence.
>> 
>> If you disable incremental checkpoints, does that help?
>> If yes, it could be an issue with too many small SSTable files due to 
>> incremental checkpoints (an issue we have on the roadmap to fix).
>> 
>> On Thu, May 24, 2018 at 3:52 PM, Piotr Nowojski <pi...@data-artisans.com 
>> <mailto:pi...@data-artisans.com>> wrote:
>> Hi,
>> 
>> This issue might have something to do with compaction. Problems with 
>> compaction can especially degrade reads performance (or just increase reads 
>> IO). Have you tried to further enforce more compactions or change 
>> CompactionStyle?
>> 
>> Have you taken a look on 
>> org.apache.flink.contrib.streaming.state.PredefinedOptions?
>> 
>> Maybe Stefan or Andrey could share more input on this.
>> 
>> Piotrek
>> 
>> 
>> > On 22 May 2018, at 08:12, Govindarajan Srinivasaraghavan 
>> > <govindragh...@gmail.com <mailto:govindragh...@gmail.com>> wrote:
>> > 
>> > Hi All,
>> > 
>> > We are running flink in AWS and we are observing a strange behavior. We 
>> > are using docker containers, EBS for storage and Rocks DB state backend. 
>> > We have a few map and value states with checkpointing every 30 seconds and 
>> > incremental checkpointing turned on. The issue we are noticing is the read 
>> > IOPS and read throughput gradually increases over time and keeps 
>> > constantly growing. The write throughput and write bytes are not 
>> > increasing as much as reads. The checkpoints are written to a durable NFS 
>> > storage. We are not sure what is causing this constant increase in read 
>> > throughput but due to which we are running out of EBS burst balance and 
>> > need to restart the job every once in a while. Attached the EBS read and 
>> > write metrics. Has anyone encountered this issue and what could be the 
>> > possible solution.
>> > 
>> > We have also tried setting the below rocksdb options but didn't help.
>> > 
>> > DBOptions:
>> > currentOptions.setOptimizeFiltersForHits(true)
>> >         .setWriteBufferSize(536870912)
>> >         .setMaxWriteBufferNumber(5)
>> >         .setMinWriteBufferNumberToMerge(2);
>> > ColumnFamilyOptions:
>> > 
>> > currentOptions.setMaxBackgroundCompactions(4)
>> >         .setMaxManifestFileSize(1048576)
>> >         .setMaxLogFileSize(1048576);
>> > 
>> > 
>> > 
>> > Thanks.
>> >  
>> >  
>> >  
>> 
>> 
> 

Reply via email to