[ https://issues.apache.org/jira/browse/TEZ-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479252#comment-16479252 ]
Jason Lowe commented on TEZ-3937: --------------------------------- Thanks for the patch! I love how this drastically simplifies the code for BitSet serialization, but I do have one concern with the change. If I'm reading the code correctly, the new vesion will serialize and deserialize the bytes in the opposite order that the old code does. I think it's fine as long as all callers are pairing these two methods together, e.g.: all calls to deserialize a BitSet are passing byte arrays that were encoded with the new encoded method. What worries me there is if a job ever ran where some Tez jars happened to be an older version before this change. The BitSet would be deserialized mostly backwards (same order of bits within a byte but opposite byte order). That could lead to silent dataloss since the consumer may think a partition is empty when in fact it's a completely different partition is empty, due to how shuffle interprets these BitSets. If accidental mixing of Tez jars isn't a concern I think this change is fine. If we're mostly worried about the wasted byte when the number of bits is a multiple of 8 then I think a safer change is to fix the toByteArray method to size the destination byte array properly instead. > Empty partition BitSet to byte[] conversion creates one extra byte in > rounding error > ------------------------------------------------------------------------------------ > > Key: TEZ-3937 > URL: https://issues.apache.org/jira/browse/TEZ-3937 > Project: Apache Tez > Issue Type: Bug > Reporter: Jonathan Eagles > Assignee: Jonathan Eagles > Priority: Major > Attachments: TEZ-3937.001.patch > > > Byte array length calculation is defined as (bitset.length / 8) + 1 which has > off by one errors on byte boundaries. For example, BitSet of length 0 is > converted to a byte array of length 1. This was introduced as part of TEZ-972 > since BitSet.toByteArray and valueOf were not supported as Tez supported Java > 6 at the time and API was introduced in Java 7. -- This message was sent by Atlassian JIRA (v7.6.3#76005)