[
https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871726#comment-15871726
]
Steve Loughran commented on HADOOP-13282:
-----------------------------------------
for encryption, we'd have to return some {{FileChecksum}} instance which was
either random or null, some way to warn distcp that it shouldn't expect the
etags between encrypted files to be consistent. Having a different value
depending on the #of parts is even more complex. I think we should leave this
alone for now.
that said, being able to know the #of parts could be vaguely useful when
partitioning files -though without block lengths not that useful, and a
probably a distraction to work on it. You would never get the big speedup which
comes from scheduling work on the same host as the data, just the smaller
speedup which could come from using a different block off the s3 filestore, and
so potentially less conflict for the same data
> S3 blob etags to be made visible in status/getFileChecksum() calls
> ------------------------------------------------------------------
>
> Key: HADOOP-13282
> URL: https://issues.apache.org/jira/browse/HADOOP-13282
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Priority: Minor
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be
> possible to probe for a blob being in sync with a local file. Distcp could
> use this to decide whether to skip a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to
> implement the same algorithm. It'd only work out the box if you were copying
> between S3 instances. There are also quirks with encryption and multipart:
> [s3
> docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
> At the very least, it's something which could be used when indexing the FS,
> to check for changes later.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]