[
https://issues.apache.org/jira/browse/HADOOP-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592311#comment-14592311
]
Colin Patrick McCabe commented on HADOOP-10798:
-----------------------------------------------
OK, I did a little more research into this. The {{globStatus}} code back in
2.0.3-alpha does sort the entries it returned. Around Hadoop 2.3, the sort was
lost during the globber rewrite. This was a bug, but it was hidden by the fact
that HDFS sorts its listStatus entries (this behavior is undocumented, but 100%
consistent).
Since the API documentation says that sorted entries are returned, and since
this is the case for the vast majority of use-cases (i.e. when using Hadoop
with HDFS), I think changing this behavior in {{globStatus}} would be an
incompatible change. Any user code relying on the old documented behavior
would break. Let's commit the original patch I posted to fix this situation.
If we want to have a discussion about changing the API contract we can have
that discussion for Hadoop 3.0 only.
also I feel that the facts that:
1. globStatus has historically had a sort in it
2. users who want to optimize by avoiding a sort can use listStatus
strongly suggest that changing this behavior is not a good idea, even in 3.x.
> globStatus() does not return sorted list of files
> -------------------------------------------------
>
> Key: HADOOP-10798
> URL: https://issues.apache.org/jira/browse/HADOOP-10798
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Felix Borchers
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Labels: BB2015-05-TBR
> Attachments: HADOOP-10798.001.patch
>
>
> (FileSystem) globStatus() does not return a sorted file list anymore.
> But the API says: " ... Results are sorted by their names."
> Seems to be lost, when the Globber Object was introduced. Can't find a sort
> in actual code.
> code to check this behavior:
> {code}
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.get(conf);
> Path path = new Path("/tmp/" + System.currentTimeMillis());
> fs.mkdirs(path);
> fs.deleteOnExit(path);
> fs.createNewFile(new Path(path, "2"));
> fs.createNewFile(new Path(path, "3"));
> fs.createNewFile(new Path(path, "1"));
> FileStatus[] status = fs.globStatus(new Path(path, "*"));
> Collection list = new ArrayList();
> for (FileStatus f: status) {
> list.add(f.getPath().toString());
> //System.out.println(f.getPath().toString());
> }
> boolean sorted = Ordering.natural().isOrdered(list);
> Assert.assertTrue(sorted);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)