[ 
https://issues.apache.org/jira/browse/CRUNCH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436211#comment-13436211
 ] 

Gabriel Reid commented on CRUNCH-23:
------------------------------------

I was just going to take a look into this as well -- and I've got a couple of 
questions. Is the patch CRUNCH-23-sorting-issue.patch the full cumulative 
patch? Also, I just took a quick look at it, and it appears that it might be 
reliant on using SequenceFiles (and therefore it wouldn't work with Avro) -- 
any idea if this is the case?
                
> PCollection#sort doesn't do a full sort on values
> -------------------------------------------------
>
>                 Key: CRUNCH-23
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-23
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>            Assignee: Rahul Sharma
>         Attachments: 0001-CRUNCH-23-fix-sorting.patch, 
> CRUNCH-23-sorting-issue.patch, 
> CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch, SortTest.java
>
>
> When a PCollection is sorted (using PCollection#sort), the sorting that is 
> performed is only per reducer, and not an absolute sort over all values. This 
> means that the values are not in sorted order if they are iterated over on a 
> materialized collection. It also means that the sorted files that are output 
> from a sort operation can not be simply concatenated to come to a single 
> sorted file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to