[ 
https://issues.apache.org/jira/browse/HADOOP-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592311#comment-14592311
 ] 

Colin Patrick McCabe commented on HADOOP-10798:
-----------------------------------------------

OK, I did a little more research into this.  The {{globStatus}} code back in 
2.0.3-alpha does sort the entries it returned.  Around Hadoop 2.3, the sort was 
lost during the globber rewrite.  This was a bug, but it was hidden by the fact 
that HDFS sorts its listStatus entries (this behavior is undocumented, but 100% 
consistent).

Since the API documentation says that sorted entries are returned, and since 
this is the case for the vast majority of use-cases (i.e. when using Hadoop 
with HDFS), I think changing this behavior in {{globStatus}} would be an 
incompatible change.  Any user code relying on the old documented behavior 
would break.  Let's commit the original patch I posted to fix this situation.  
If we want to have a discussion about changing the API contract we can have 
that discussion for Hadoop 3.0 only.

also I feel that the facts that:
1. globStatus has historically had a sort in it
2. users who want to optimize by avoiding a sort can use listStatus

strongly suggest that changing this behavior is not a good idea, even in 3.x.

> globStatus() does not return sorted list of files
> -------------------------------------------------
>
>                 Key: HADOOP-10798
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10798
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Felix Borchers
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-10798.001.patch
>
>
> (FileSystem) globStatus() does not return a sorted file list anymore.
> But the API says: " ... Results are sorted by their names."
> Seems to be lost, when the Globber Object was introduced. Can't find a sort 
> in actual code.
> code to check this behavior:
> {code}
>         Configuration conf = new Configuration();
>         FileSystem fs = FileSystem.get(conf);
>         Path path = new Path("/tmp/" + System.currentTimeMillis());
>         fs.mkdirs(path);
>         fs.deleteOnExit(path);
>         fs.createNewFile(new Path(path, "2"));
>         fs.createNewFile(new Path(path, "3"));
>         fs.createNewFile(new Path(path, "1"));
>         FileStatus[] status = fs.globStatus(new Path(path, "*"));
>         Collection list = new ArrayList();
>         for (FileStatus f: status) {
>             list.add(f.getPath().toString());
>             //System.out.println(f.getPath().toString());
>         }
>         boolean sorted = Ordering.natural().isOrdered(list);
>         Assert.assertTrue(sorted);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to