[ https://issues.apache.org/jira/browse/HADOOP-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-14159. ------------------------------------- Fix Version/s: 3.3.1 Resolution: Duplicate done inside HADOOP-17450 > Add some Java-8 friendly way to work with RemoteIterable, especially listings > ----------------------------------------------------------------------------- > > Key: HADOOP-14159 > URL: https://issues.apache.org/jira/browse/HADOOP-14159 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 3.0.0-alpha2 > Reporter: Steve Loughran > Priority: Minor > Fix For: 3.3.1 > > > There's a fair amount of Hadoop code which uses {{FileSystem.listStatus(path) > }} just to get an {{FileStatus[]}} array which they can then iterate over in > a {{for}} loop. > This is inefficient and scales badly, as the entire listing is done before > the compute; it cannot handle directories with millions of entries. > The listLocatedStatus() calls return a RemoteIterator class, which can't be > used in for loops as it has the right to throw an IOE in any hasNext/next > call. That doesn't matter, as we now have closures and simple stream > operations. > {code} > listLocatedStatus(path).filter((st) -> st.length > 0).apply(st -> > fs.delete(st.path))}} > {code} > See? We could do shiny new closure things. It wouldn't necessarily need > changes to FileSystem either, just something which took {{RemoteIterator}} > and let you chain some closures off it, similar to the java 8 streams > operations. > Once implemented, we can move to using it in the Hadoop code wherever we use > listFiles() today -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org