[ 
https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880396#comment-13880396
 ] 

Josh Wills commented on CRUNCH-329:
-----------------------------------

Yeah, storing the key schema in the Configuration might work, since it's the 
only one we really need to have access to during the shuffle phase. As you 
said, at that point we've basically re-invented Avro for writables, except it 
would be slower since we couldn't do direct binary comparisons using the 
schemas (well, I mean we could, but then we would have really-really reinvented 
Avro.)

Since we're talking about re-inventing Avro anyway, we could theoretically 
force all shuffles to be done in terms of Avro schemas. We have a hack in place 
for supporting Writables through Avro, and of course all of the other primitive 
types have natural Avro correspondences. Not sure which of these options is the 
least insane at this point. ;-)

> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
>                 Key: CRUNCH-329
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-329
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after 
> we hacked the TupleWritable impl to make all of the fields BytesWritables 
> (e.g., secondary IntWritable values will no longer be sorted correctly, even 
> though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for 
> each possible WritableComparable type in a pipeline that we can use to decode 
> what Writable type each tuple field corresponds to. This allows us to keep 
> the various fields sortable while still doing a reasonable job of minimizing 
> the serialization required to pass the type information along.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to