[
https://issues.apache.org/jira/browse/HADOOP-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771817#comment-16771817
]
Steve Loughran commented on HADOOP-16077:
-----------------------------------------
If you call {{FileSystems.listFiles(path, recursive)}}, you get a
RemoteIterator<LocatedFileStatus> ; LocatedFileStatus contains an array of
blocklocations, which are meant to contain the block locations and storage types
This is the best API For a recursive file listing as
* on HDFS: bulk incremental updates to reduce marshalling & time NN is locked
* on object stores: the option of switching to more efficient path enumeration
over treewalks. S3A does this & delivers O(files/1000) listings irrespective of
the directory tree depth
now, that's a bigger leap for ls -R than just listing the storage type, but
it'd be great to expose that operation in general, because ls -R is so
inefficient here.
Trouble is of course, both Ls and LsR extend Command, which implements its
treewalk recursively. Moving to a new iterator would be traumatic. Except
maybe, just maybe, we could do something like have it support both forms of
list & recurse, and for it to become an option to switch to; if you ask for
storage levels, you must explicitly ask for the new recurse option.
Maybe a separate "deepLs" command would be the strategy
Have a look at {{S3aUtils.applyLocatedFiles()}} if you want to see some fun
with closures and iterating over a list of LocatedFileStatus entries. That
could all be promoted into {{org.apache.hadoop.util.LambdaUtils}} or the new
{{org.apache.hadoop.fs.impl}} package.
BTW: I'm thinking that we could have the object stores expose their archive
status of files in the storage type, so things like AWS Glacier storage would
be visible. Being able to list here would be idea.
> Add an option in ls command to include storage policy
> -----------------------------------------------------
>
> Key: HADOOP-16077
> URL: https://issues.apache.org/jira/browse/HADOOP-16077
> Project: Hadoop Common
> Issue Type: Improvement
> Components: common
> Affects Versions: 3.3.0
> Reporter: Ayush Saxena
> Assignee: Ayush Saxena
> Priority: Major
> Attachments: HADOOP-16077-01.patch, HADOOP-16077-02.patch,
> HADOOP-16077-03.patch, HADOOP-16077-04.patch, HADOOP-16077-05.patch,
> HADOOP-16077-06.patch, HADOOP-16077-07.patch, HADOOP-16077-08.patch,
> HADOOP-16077-09.patch
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]