[
https://issues.apache.org/jira/browse/HADOOP-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971034#action_12971034
]
David Rosenstrauch commented on HADOOP-6298:
--------------------------------------------
Yeesh. I just got bit on this same bug, but from a different direction.
Calling BytesWritable.getBytes() returns a reference to the BytesWritable's
internal byte array. I was calling that, and then using that byte array in
subsequent processing. Problem is that the BytesWritable was also still
holding onto a copy of that array, and later modifying it - thus modifying my
copy as well. This was a really subtle bug that was hard to find, and I wasted
a lot of time on it.
I realize there's a need to get access to a BytesWriteable's internal byte
storage without performing an array copy. But again, I think there needs to be
some additional *safe* method to retrieve a byte array that's a *copy* of a
ByteWriteable's contents. There's just too many potential pitfalls for
developers if the situation is just left as is.
> BytesWritable#getBytes is a bad name that leads to programming mistakes
> -----------------------------------------------------------------------
>
> Key: HADOOP-6298
> URL: https://issues.apache.org/jira/browse/HADOOP-6298
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 0.20.1
> Reporter: Nathan Marz
>
> Pretty much everyone at Rapleaf who has worked with Hadoop has misused
> BytesWritable#getBytes at some point, not expecting the byte array to be
> padded. I think we can completely alleviate these programming mistakes by
> deprecating and renaming this method (again) to be more descriptive. I
> propose "getPaddedBytes()" or "getPaddedValue()". It would also be helpful to
> have a helper method "getNonPaddedValue()" that makes a copy into a
> non-padded byte array.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.