[ 
https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510508#comment-17510508
 ] 

Steve Loughran commented on HADOOP-14837:
-----------------------------------------

good questions, -I have no idea what the right answers are

bq. For reporting better, do we want to add in a new statistic, something like 
`objects_in_glacier` which will have the count of objects currently in glacier?

why not?

bq. In listings, we can add in a new option to filter out glacier files by 
doing something like `!summary.getStorageClass().equals("GLACIER")` in the 
acceptor here? After we do this and call `getContentSummary()` it won't return 
glacier files in the fileCount. 

I'm not worried about that. is the storage type returned in the list call. 
allowing it to be filtered there? i wouldn't want to do any HEAD requests here

bq. getBlockLocations()

there's special handling in spark for that location, which says "run your work 
anywnere". we doin't want to break that.

I think the best tactic here is to work out what people I want to do here and 
provide the bare minimum. Looking at some of the JIRAs there's no consensus as 
to what people want. Do they want glaciated files to be skipped in queries? or 
for recovery to be triggered (somehow). Returning the storage type ARCHIVE 
would be enough for anyone who wants to identify these files (distcp?) and at 
least then know there's a cost in accessing them. 

> Handle S3A "glacier" data
> -------------------------
>
>                 Key: HADOOP-14837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14837
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Steve Loughran
>            Priority: Minor
>
> SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier, 
> they appear in the listing but GETs fail, and so does everything else
> We should think about how best to handle this.
> # report better
> # if listings can identify files which are glaciated then maybe we could have 
> an option to filter them out
> # test & see what happens



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to