[jira] [Commented] (FLINK-25200) Implement duplicating for s3 filesystem

Piotr Nowojski (Jira) Thu, 20 Jan 2022 07:11:08 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479437#comment-17479437
 ]


Piotr Nowojski commented on FLINK-25200:
----------------------------------------

[~yunta], I'm not sure how much more information would a more realistic test 
give us. Yes, one thing not covered by [~akalashnikov]'s test is local IO. But 
when re-uploading instead of duplicating file, it's quite likely that the state 
file will be already in the file cache for example. 

Regardless, after looking at those results, I'm beginning to doubt if it makes 
sense to provide native duplicate support for S3. It looks like the performance 
cost of both of those operations on the AWS side is the same. I was 
hoping/expecting orders of magnitude performance difference in favour of the 
CopyObject API.

> Implement duplicating for s3 filesystem
> ---------------------------------------
>
>                 Key: FLINK-25200
>                 URL: https://issues.apache.org/jira/browse/FLINK-25200
>             Project: Flink
>          Issue Type: Sub-task
>          Components: FileSystems
>            Reporter: Dawid Wysakowicz
>            Priority: Major
>             Fix For: 1.15.0
>
>
> We can use https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25200) Implement duplicating for s3 filesystem

Reply via email to