Steve Loughran created HADOOP-14159: ---------------------------------------
Summary: Add some Java-8 friendly way to work with RemoteIterable, especially listings Key: HADOOP-14159 URL: https://issues.apache.org/jira/browse/HADOOP-14159 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 3.0.0-alpha2 Reporter: Steve Loughran There's a fair amount of Hadoop code which uses {{FileSystem.listStatus(path) }} just to get an {{FileStatus[]}} array which they can then iterate over in a {{for}} loop. This is inefficient and scales badly, as the entire listing is done before the compute; it cannot handle directories with millions of entries. The listLocatedStatus() calls return a RemoteIterator class, which can't be used in for loops as it has the right to throw an IOE in any hasNext/next call. That doesn't matter, as we now have closures and simple stream operations. {code} listLocatedStatus(path).filter((st) -> st.length > 0).apply(st -> fs.delete(st.path))}} {code} See? We could do shiny new closure things. It wouldn't necessarily need changes to FileSystem either, just something which took {{RemoteIterator}} and let you chain some closures off it, similar to the java 8 streams operations. Once implemented, we can move to using it in the Hadoop code wherever we use listFiles() today -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org