[
https://issues.apache.org/jira/browse/FLINK-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263460#comment-17263460
]
Paul S commented on FLINK-10841:
--------------------------------
> Does this help?
Massively yes, thank you! :)
I wasn't aware this was an option.
I've tested setting these configs on flink jobs:
-----
fs.s3a.endpoint: http://mycustomendpoint.local
s3.endpoint: http://mycustomendpoint.local
state.checkpoints.dir: s3a://bucketname/flink-jobs
state.savepoints.dir: s3a://bucketname/flink-savepoints
-----
And so far they seem to be running.
I'll monitor the performance aspects and if all is ok I won't report back. (y)
> Reduce the number of ListObjects calls when checkpointing to S3
> ---------------------------------------------------------------
>
> Key: FLINK-10841
> URL: https://issues.apache.org/jira/browse/FLINK-10841
> Project: Flink
> Issue Type: Improvement
> Components: FileSystems
> Affects Versions: 1.5.5, 1.6.2
> Reporter: Pawel Bartoszek
> Priority: Minor
>
> With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see
> loads of ListObjects calls. For instance the job with ~1600 tasks requires
> around 23000 ListObjects calls for every checkpoint including clearing it up
> by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds
> of dollars pay month just for ListObjects calls. I am aware that
> implementation details might be hidden in Hadoop jar and maybe difficult to
> change, but at least maybe some workaround might be suggested?
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)