[
https://issues.apache.org/jira/browse/CRUNCH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440170#comment-13440170
]
Rahul Sharma commented on CRUNCH-23:
------------------------------------
Gabriel, I check the patch for avro files. It does not work. My bad, should
have verified it earlier. Also while fixing it I am getting stuck at a point.
In the end the TotalOrdePartioner requires a SequenceFile. How can I make one
using the keys from Avro data? Still trying out a few options e.g. configuring
AvroSequenceFileOutputFormat.
> PCollection#sort doesn't do a full sort on values
> -------------------------------------------------
>
> Key: CRUNCH-23
> URL: https://issues.apache.org/jira/browse/CRUNCH-23
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Rahul Sharma
> Attachments: 0001-CRUNCH-23-fix-sorting.patch,
> CRUNCH-23-sorting-issue.patch,
> CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch, SortTest.java
>
>
> When a PCollection is sorted (using PCollection#sort), the sorting that is
> performed is only per reducer, and not an absolute sort over all values. This
> means that the values are not in sorted order if they are iterated over on a
> materialized collection. It also means that the sorted files that are output
> from a sort operation can not be simply concatenated to come to a single
> sorted file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira