[
https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496047#comment-17496047
]
Ahmar Suhail commented on HADOOP-14837:
---------------------------------------
[[email protected]] I've been looking at this and had a few questions:
* For reporting better, do we want to add in a new statistic, something like
`objects_in_glacier` which will have the count of objects currently in glacier?
* In listings, we can add in a new option to filter out glacier files by doing
something like `!summary.getStorageClass().equals("GLACIER")` in the acceptor
[here|https://github.com/apache/hadoop/blob/365375412fe5eea82549630ee8c5598502b95caf/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L770]?
After we do this and call `getContentSummary()` it won't return glacier files
in the fileCount.
* To return StorageType.Archive for a file, I was looking at
getBlockLocations, it'll currently return something like `BlockLocation( \{
"localhost:9866" }, \{ "localhost" }, 0, file.getLen())` , so not sure how we
want it to behave when implemented in S3AFS? Will it be something like
`BlockLocation( \{ filepath }, \{ StorageType.Archive.toString() }, 0,
file.getLen())` ?
* Do we want implement retrieval in open()? If yes, will the behaviour be:
** If fs.s3a.open.glacier.retrieve is enabled, check if file is in glacier, if
yes, initiate restore
** If restore has not complete and .read() is called, throw "cannot read yet
-retrieval requested"
** If restore has not been initiated (can happen when
fs.s3a.open.glacier.retrieve is false) and .read() is called throw "cannot read
data in glacier"
> Handle S3A "glacier" data
> -------------------------
>
> Key: HADOOP-14837
> URL: https://issues.apache.org/jira/browse/HADOOP-14837
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0-beta1
> Reporter: Steve Loughran
> Priority: Minor
>
> SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier,
> they appear in the listing but GETs fail, and so does everything else
> We should think about how best to handle this.
> # report better
> # if listings can identify files which are glaciated then maybe we could have
> an option to filter them out
> # test & see what happens
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]