[
https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902993#comment-13902993
]
Chao Shi commented on CRUNCH-329:
---------------------------------
an exmaple stacktrace:
{code}
"SpillThread" daemon prio=10 tid=0x00007f1db4e62800 nid=0x1f97 runnable
[0x00007f1dab5e1000]
java.lang.Thread.State: RUNNABLE
at java.io.DataInputStream.readInt(DataInputStream.java:372)
at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
at
org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:171)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:125)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:968)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:122)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1254)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:712)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1199)
{code}
> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
> Key: CRUNCH-329
> URL: https://issues.apache.org/jira/browse/CRUNCH-329
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.10.0, 0.8.3
> Reporter: Josh Wills
> Assignee: Josh Wills
> Fix For: 0.10.0, 0.8.3
>
> Attachments: CRUNCH-329.patch, CRUNCH-329b.patch,
> fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after
> we hacked the TupleWritable impl to make all of the fields BytesWritables
> (e.g., secondary IntWritable values will no longer be sorted correctly, even
> though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for
> each possible WritableComparable type in a pipeline that we can use to decode
> what Writable type each tuple field corresponds to. This allows us to keep
> the various fields sortable while still doing a reasonable job of minimizing
> the serialization required to pass the type information along.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)