[ 
https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699395#action_12699395
 ] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

The patch could be backwards-compatible if the bitset was written to the stream 
as VLongs (essentially what had been implemented in HADOOP-5589-2.patch, minus 
the bug with sparse bitsets), as the bytes written to the stream would be 
exactly the same in both implementations as long as there were less than 64 
values.

However, because we can't read an old TupleWritable containing over 64 values 
without throwing an EOFException, it won't be "fully" backwardly-compatible. 

While I would be tempted to argue that TupleWritable never supported over 64 
values in a tuple anyway, is there still a need to support users who were 
storing tuples over 64 values but with incorrect results? 



> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, 
> HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I 
> presume is so that it can quickly determine if a position has been written to 
> in its array of Writables (by using bit-shifting operations on the long 
> field). The problem with this is that it implies that there is a maximum 
> limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had 
> two MR jobs with over 64 reduces tasks and you wanted to join the outputs 
> with CompositeInputFormat  - this will probably cause unexpected results in 
> the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to