[ 
https://issues.apache.org/jira/browse/TEZ-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479252#comment-16479252
 ] 

Jason Lowe commented on TEZ-3937:
---------------------------------

Thanks for the patch!

I love how this drastically simplifies the code for BitSet serialization, but I 
do have one concern with the change.  If I'm reading the code correctly, the 
new vesion will serialize and deserialize the bytes in the opposite order that 
the old code does.  I think it's fine as long as all callers are pairing these 
two methods together, e.g.: all calls to deserialize a BitSet are passing byte 
arrays that were encoded with the new encoded method.

What worries me there is if a job ever ran where some Tez jars happened to be 
an older version before this change.  The BitSet would be deserialized mostly 
backwards (same order of bits within a byte but opposite byte order).  That 
could lead to silent dataloss since the consumer may think a partition is empty 
when in fact it's a completely different partition is empty, due to how shuffle 
interprets these BitSets.

If accidental mixing of Tez jars isn't a concern I think this change is fine.  
If we're mostly worried about the wasted byte when the number of bits is a 
multiple of 8 then I think a safer change is to fix the toByteArray method to 
size the destination byte array properly instead.


> Empty partition BitSet to byte[] conversion creates one extra byte in 
> rounding error
> ------------------------------------------------------------------------------------
>
>                 Key: TEZ-3937
>                 URL: https://issues.apache.org/jira/browse/TEZ-3937
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>            Priority: Major
>         Attachments: TEZ-3937.001.patch
>
>
> Byte array length calculation is defined as (bitset.length / 8) + 1 which has 
> off by one errors on byte boundaries. For example, BitSet of length 0 is 
> converted to a byte array of length 1. This was introduced as part of TEZ-972 
> since BitSet.toByteArray and valueOf were not supported as Tez supported Java 
> 6 at the time and API was introduced in Java 7.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to