Steve Loughran updated HADOOP-14159:
        Parent: HADOOP-16829
    Issue Type: Sub-task  (was: Improvement)

> Add some Java-8 friendly way to work with RemoteIterable, especially listings
> -----------------------------------------------------------------------------
>                 Key: HADOOP-14159
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14159
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Steve Loughran
>            Priority: Minor
> There's a fair amount of Hadoop code which uses {{FileSystem.listStatus(path) 
> }} just to get an {{FileStatus[]}} array which they can then iterate over in 
> a {{for}} loop.
> This is inefficient and scales badly, as the entire listing is done before 
> the compute; it cannot handle directories with millions of entries. 
> The listLocatedStatus() calls return a RemoteIterator class, which can't be 
> used in for loops as it has the right to throw an IOE in any hasNext/next 
> call. That doesn't matter, as we now have closures and simple stream 
> operations.
> {code}
>  listLocatedStatus(path).filter((st) -> st.length > 0).apply(st -> 
> fs.delete(st.path))}}
> {code}
> See? We could do shiny new closure things. It wouldn't necessarily need 
> changes to FileSystem either, just something which took {{RemoteIterator}} 
> and let you chain some closures off it, similar to the java 8 streams 
> operations.
> Once implemented, we can move to using it in the Hadoop code wherever we  use 
> listFiles() today

This message was sent by Atlassian Jira

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to