> > This sounds like a bug. > > Let's say you create a Text object and drop in a String that sets the byte > array length to 200. Then drop in a a second String that sets the byte array > length to 500. Since, the new length is greater than the previous length; > the byte array length is reset to the longer length. Now, if you drop in a > third String that would set the byte array length to 350; the Text object > does not replace the byte array with a new length of 350; it utilizes the > greater length of 500 and sets an extra variable to track the "real" length. > > So: Text.getBytes().length != Text.getLength() > > This does 2 things: > > 1. Passes around more data than what is needed > 2. Makes the Text object confusing to work with > > Text.getBytes().length == Text.getLength() - should be the correct behavior. > >
I don't think so. Passing around byte arrays larger than the valid data is common practice in Java for performance reasons. Hence, the common method signature containing (byte[] bytes, int len, int offset) and similar. Creating a new byte array for each resize defeats the purpose of re-using the byte array and the Text object -- lower memory allocation and improved CPU cache locality. The byte array here is a buffer, it does not represent the entire string.
