[ 
https://issues.apache.org/jira/browse/FLINK-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262719#comment-17262719
 ] 

Ufuk Celebi commented on FLINK-10841:
-------------------------------------

[~uberspot] Presto's S3 FileSystem seems to only use v1 requests [1] whereas 
newer versions of the Hadoop S3 FileSystem should use v2 requests by default 
[2]. If both file system plugins are on the class path you have to explicitly 
pick Hadoop via {{s3a://}}. See also: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/filesystems/s3.html#hadooppresto-s3-file-systems-plugins]

Does this help?

 

[1] 
https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java#L523-L527

[2] 
https://github.com/apache/hadoop/blob/rel/release-3.1.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L297-L302

> Reduce the number of ListObjects calls when checkpointing to S3
> ---------------------------------------------------------------
>
>                 Key: FLINK-10841
>                 URL: https://issues.apache.org/jira/browse/FLINK-10841
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystems
>    Affects Versions: 1.5.5, 1.6.2
>            Reporter: Pawel Bartoszek
>            Priority: Minor
>
> With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see 
> loads of ListObjects calls. For instance the job with ~1600 tasks requires 
> around 23000 ListObjects calls for every checkpoint including clearing it up 
> by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds 
> of dollars pay month just for ListObjects calls. I am aware that 
> implementation details might be hidden in Hadoop jar and maybe difficult to 
> change, but at least maybe some workaround might be suggested?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to