[
https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300112#comment-16300112
]
Steve Loughran commented on HADOOP-13282:
-----------------------------------------
I think that checksum preservation is all about HDFS, certainly its where it
crops up...there are enums of checksum options and things. I'm avoiding it for
now
> S3 blob etags to be made visible in status/getFileChecksum() calls
> ------------------------------------------------------------------
>
> Key: HADOOP-13282
> URL: https://issues.apache.org/jira/browse/HADOOP-13282
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch,
> HADOOP-13282-003.patch, HADOOP-13282-004.patch
>
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be
> possible to probe for a blob being in sync with a local file. Distcp could
> use this to decide whether to skip a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to
> implement the same algorithm. It'd only work out the box if you were copying
> between S3 instances. There are also quirks with encryption and multipart:
> [s3
> docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
> At the very least, it's something which could be used when indexing the FS,
> to check for changes later.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]