[
https://issues.apache.org/jira/browse/HADOOP-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820323#comment-16820323
]
Steve Loughran commented on HADOOP-16259:
-----------------------------------------
# no need to worry about s3n, s3, both are gone from trunk
# and yes, your work will have to go into trunk
Ther
* See HADOOP-12020 for reduced storage layer discussion; fairly
straightforward. If you want to add that option *with tests*, welcome to do it.
* HADOOP-14837 covers identifying glaciated files, with the proposal for
{{BlockLocation}} instances made on S3 to add the storage class.
glacier is trouble as the files LIST and head GET fails. So how do you deal
with it in queries? Not directly my probem (hive, spark). I'll give them the
visibility into the issue, so they can choose to react
How do you upload data to Glacier? If you want to do it direct, you can't use
the S3 client, you need {{AmazonGlacierClient}}. See
https://docs.aws.amazon.com/amazonglacier/latest/dev/getting-started-upload-archive.html
for details. This makes it a much more complex operation than "just" setting
the x-amz-storage-class";
To repeat: you cannot upload data to S3 and say that it goes direct to glacier.
Instead you upload as normal and set a lifecycle on the bucket to archive after
24h. And because of that, with a 24h delay, you can get your data up.
Returning to your distcp proposal then,
* Upload to glacier. WONTFIX
* Upload to reduced storage: HADOOP-12020 -please fix for us!
All work has to be in trunk, for distcp to work properly you also need the
-direct option, which isn't going to be backported. I doubt this will be
either. Changing target version to 3.2.x.
> Distcp to set S3 Storage Class
> ------------------------------
>
> Key: HADOOP-16259
> URL: https://issues.apache.org/jira/browse/HADOOP-16259
> Project: Hadoop Common
> Issue Type: New Feature
> Components: hadoop-aws
> Affects Versions: 2.8.4
> Reporter: Prakash Gopalsamy
> Priority: Minor
> Labels: aws-s3, distcp
> Attachments: ENHANCE_HADOOP_DISTCP_FOR_CUSTOM_S3_STORAGE_CLASS.docx
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Hadoop distcp implementation doesn’t have properties to override Storage
> class while transferring data to Amazon S3 storage. Hadoop distcp doesn’t set
> any storage class while transferring data to Amazon S3 storage. Due to this
> all the objects moved from cluster to S3 using Hadoop Distcp are been stored
> in the default storage class “STANDARD”. By providing a new feature to
> override the default S3 storage class through configuration properties will
> be helpful to upload objects in other storage classes. I have come up with a
> design to implement this feature in a design document and uploaded the same
> in the JIRA. Kindly review and let me know for your suggestions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]