Steve Loughran created HADOOP-14159:
---------------------------------------
Summary: Add some Java-8 friendly way to work with RemoteIterable,
especially listings
Key: HADOOP-14159
URL: https://issues.apache.org/jira/browse/HADOOP-14159
Project: Hadoop Common
Issue Type: Improvement
Components: fs
Affects Versions: 3.0.0-alpha2
Reporter: Steve Loughran
There's a fair amount of Hadoop code which uses {{FileSystem.listStatus(path)
}} just to get an {{FileStatus[]}} array which they can then iterate over in a
{{for}} loop.
This is inefficient and scales badly, as the entire listing is done before the
compute; it cannot handle directories with millions of entries.
The listLocatedStatus() calls return a RemoteIterator class, which can't be
used in for loops as it has the right to throw an IOE in any hasNext/next call.
That doesn't matter, as we now have closures and simple stream operations.
{code}
listLocatedStatus(path).filter((st) -> st.length > 0).apply(st ->
fs.delete(st.path))}}
{code}
See? We could do shiny new closure things. It wouldn't necessarily need changes
to FileSystem either, just something which took {{RemoteIterator}} and let you
chain some closures off it, similar to the java 8 streams operations.
Once implemented, we can move to using it in the Hadoop code wherever we use
listFiles() today
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]