[
https://issues.apache.org/jira/browse/HADOOP-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran resolved HADOOP-14159.
-------------------------------------
Fix Version/s: 3.3.1
Resolution: Duplicate
done inside HADOOP-17450
> Add some Java-8 friendly way to work with RemoteIterable, especially listings
> -----------------------------------------------------------------------------
>
> Key: HADOOP-14159
> URL: https://issues.apache.org/jira/browse/HADOOP-14159
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 3.0.0-alpha2
> Reporter: Steve Loughran
> Priority: Minor
> Fix For: 3.3.1
>
>
> There's a fair amount of Hadoop code which uses {{FileSystem.listStatus(path)
> }} just to get an {{FileStatus[]}} array which they can then iterate over in
> a {{for}} loop.
> This is inefficient and scales badly, as the entire listing is done before
> the compute; it cannot handle directories with millions of entries.
> The listLocatedStatus() calls return a RemoteIterator class, which can't be
> used in for loops as it has the right to throw an IOE in any hasNext/next
> call. That doesn't matter, as we now have closures and simple stream
> operations.
> {code}
> listLocatedStatus(path).filter((st) -> st.length > 0).apply(st ->
> fs.delete(st.path))}}
> {code}
> See? We could do shiny new closure things. It wouldn't necessarily need
> changes to FileSystem either, just something which took {{RemoteIterator}}
> and let you chain some closures off it, similar to the java 8 streams
> operations.
> Once implemented, we can move to using it in the Hadoop code wherever we use
> listFiles() today
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]