[jira] [Commented] (FLINK-10841) Reduce the number of ListObjects calls when checkpointing to S3

Paul S (Jira) Tue, 12 Jan 2021 08:28:24 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263460#comment-17263460
 ]


Paul S commented on FLINK-10841:
--------------------------------

> Does this help? 

Massively yes, thank you! :) 
I wasn't aware this was an option. 
I've tested setting these configs on flink jobs:

-----

fs.s3a.endpoint: http://mycustomendpoint.local
s3.endpoint: http://mycustomendpoint.local
 state.checkpoints.dir: s3a://bucketname/flink-jobs
 state.savepoints.dir: s3a://bucketname/flink-savepoints

-----

And so far they seem to be running. 
I'll monitor the performance aspects and if all is ok I won't report back. (y)

> Reduce the number of ListObjects calls when checkpointing to S3
> ---------------------------------------------------------------
>
>                 Key: FLINK-10841
>                 URL: https://issues.apache.org/jira/browse/FLINK-10841
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystems
>    Affects Versions: 1.5.5, 1.6.2
>            Reporter: Pawel Bartoszek
>            Priority: Minor
>
> With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see 
> loads of ListObjects calls. For instance the job with ~1600 tasks requires 
> around 23000 ListObjects calls for every checkpoint including clearing it up 
> by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds 
> of dollars pay month just for ListObjects calls. I am aware that 
> implementation details might be hidden in Hadoop jar and maybe difficult to 
> change, but at least maybe some workaround might be suggested?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-10841) Reduce the number of ListObjects calls when checkpointing to S3

Reply via email to