[
https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515405#comment-17515405
]
Ahmar Suhail edited comment on HADOOP-14837 at 3/31/22, 3:42 PM:
-----------------------------------------------------------------
[[email protected]] Object summaries include the storage class, which means we
can filter without any additional HEAD calls.
For getBlockLocations(), I was looking at how it's used in Spark, and found
that it's called
[here|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala#L307]
If we implement getBlockLocations() in S3FS to return storage type, we would
have to do an HEAD call which would slow down the above usage, not sure if
that's something we should do?
If we do want to implement getBlockLocations(), we could have a configuration
option like `fs.s3a.get.file.locations` which when enabled would make the head
call, otherwise just return the default location.
was (Author: JIRAUSER283484):
[[email protected]] Object summaries include the storage class, which means we
can filter without any additional HEAD calls.
For getBlockLocations(), I was looking at how it's used in Spark, and found
that it's called[
here|[https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala#L307]]
If we implement getBlockLocations() in S3FS to return storage type, we would
have to do an HEAD call which would slow down the above usage, not sure if
that's something we should do?
If we do want to implement getBlockLocations(), we could have a configuration
option like `fs.s3a.get.file.locations` which when enabled would make the head
call, otherwise just return the default location.
> Handle S3A "glacier" data
> -------------------------
>
> Key: HADOOP-14837
> URL: https://issues.apache.org/jira/browse/HADOOP-14837
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0-beta1
> Reporter: Steve Loughran
> Priority: Minor
>
> SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier,
> they appear in the listing but GETs fail, and so does everything else
> We should think about how best to handle this.
> # report better
> # if listings can identify files which are glaciated then maybe we could have
> an option to filter them out
> # test & see what happens
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]