On Jul 20, 2010, at 11:24 AM, Scott Carey wrote:

> 
>> 
>> This sounds like a bug.
>> 
>> Let's say you create a Text object and drop in a String that sets the byte 
>> array length to 200.  Then drop in a a second String that sets the byte 
>> array length to 500.  Since, the new length is greater than the previous 
>> length; the byte array length is reset to the longer length.  Now, if you 
>> drop in a third String that would set the byte array length to 350; the Text 
>> object does not replace the byte array with a new length of 350; it utilizes 
>> the greater length of 500 and sets an extra variable to track the "real" 
>> length.
>> 
>> So: Text.getBytes().length != Text.getLength()
>> 
>> This does 2 things:
>> 
>> 1. Passes around more data than what is needed
>> 2. Makes the Text object confusing to work with
>> 
>> Text.getBytes().length == Text.getLength() - should be the correct behavior.
>> 
>> 
> 
> I don't think so.  Passing around byte arrays larger than the valid data is 
> common practice in Java for performance reasons.  Hence, the common method 
> signature containing  (byte[] bytes, int len, int offset) and similar.   
> Creating a new byte array for each resize defeats the purpose of re-using the 
> byte array and the Text object -- lower memory allocation and improved CPU 
> cache locality.  The byte array here is a buffer, it does not represent the 
> entire string.
> 

To be more specific here, shouldn't Text.toString() do the trick?   If 
Text.toString() doesn't work and does something other than what you expect 
here, it should be documented and that class should have another helper method 
that gets you a String from Text.   Calling getBytes() and manually 
constructing a string means you should know what those bytes represent -- a 
buffer where the bytes for the string are from index - to Text.getLength().

Reply via email to