[
https://issues.apache.org/jira/browse/CRUNCH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440217#comment-13440217
]
Rahul Sharma commented on CRUNCH-23:
------------------------------------
I am able to create sequence files for avro data, with AvroKey as the key
class. When it is read back in TotalOrderPartioner then it back exceptions as
it expects the key to be of type WritableComparable :
java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast to
org.apache.hadoop.io.WritableComparable
at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:295)
at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:80)
Any suggestions ?
> PCollection#sort doesn't do a full sort on values
> -------------------------------------------------
>
> Key: CRUNCH-23
> URL: https://issues.apache.org/jira/browse/CRUNCH-23
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Rahul Sharma
> Attachments: 0001-CRUNCH-23-fix-sorting.patch,
> CRUNCH-23-sorting-issue.patch,
> CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch, SortTest.java
>
>
> When a PCollection is sorted (using PCollection#sort), the sorting that is
> performed is only per reducer, and not an absolute sort over all values. This
> means that the values are not in sorted order if they are iterated over on a
> materialized collection. It also means that the sorted files that are output
> from a sort operation can not be simply concatenated to come to a single
> sorted file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira