Pawel Bartoszek created FLINK-10841:
---------------------------------------
Summary: Reduce the number of ListObjects calls when checkpointing
to S3
Key: FLINK-10841
URL: https://issues.apache.org/jira/browse/FLINK-10841
Project: Flink
Issue Type: Improvement
Components: FileSystem
Affects Versions: 1.6.2, 1.5.5
Reporter: Pawel Bartoszek
With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see
loads of ListObjects calls. For instance the job with ~1600 tasks requires
around 23000 ListObjects calls for every checkpoint including clearing it up by
Flink. With checkpoint interval set to 5 minutes this adds up to hundreds of
dollars pay month just for ListObjects calls. I am aware that implementation
details might be hidden in Hadoop jar and maybe difficult to change, but at
least maybe some workaround might be suggested?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)