Text objects typically contain more bytes than are actually in use.
If you were to use the alternate constructor for ImmutableBytesWritable:

new ImmutableBytesWritable(input.getBytes(), 0, input.getLength());

the test will pass.

One more note: Relying on the default encoding being the same for
Strings may work on any single machine but if one machine has a
default encoding of EN_US and another's is UTF-8, passing an
ImmutableBytesWritable from one machine to another will result in
the String decoding failing. For this reason, we always specify
an encoding for String.getBytes and in the String constructor:

new ImmutableBytesWritable("this is a string".getBytes("UTF-8"))

and

new String(ibw.getBytes(), "UTF-8")

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Jason Grey [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 21, 2007 8:27 AM
> To: hadoop-user@lucene.apache.org
> Subject: Text and/or ImmutableBytesWritable issue?
>
> Can anyone explain why "testTextToBytes" doesn't assert and
> "testStringToBytes" does?
>
>
> import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
> import org.apache.hadoop.io.Text;
>
> import junit.framework.TestCase;
>
> public class TestImmutableBytesWritable extends TestCase {
>
>         public void testTextToBytes(){
>
>                 Text input = new Text("this is a test.");
>
>                 ImmutableBytesWritable bytes =
>                         new ImmutableBytesWritable(
> input.getBytes() );
>
>                 Text output = new Text( bytes.get() );
>
>                 assertEquals(input, output);
>
>         }
>
>         public void testStringToBytes(){
>
>                 String input = "this is a test.";
>
>                 ImmutableBytesWritable bytes =
>                         new ImmutableBytesWritable(
> input.getBytes() );
>
>                 String output = new String( bytes.get() );
>
>                 assertEquals(input, output);
>
>         }
> }
>
>
> If I inspect the objects during debugging at the point of the
> assert I see the following:
>
> * input
>         bytes = [116, 104, 105, 115, 32, 105
>                 , 115, 32, 97, 32, 116, 101
>                 , 115, 116, 46, 0]
>         length = 15
>
> * bytes =       [116, 104, 105, 115, 32, 105
>                 , 115, 32, 97, 32, 116, 101
>                 , 115, 116, 46, 0]
>
> * output
>         bytes = [116, 104, 105, 115, 32, 105
>                 , 115, 32, 97, 32, 116, 101
>                 , 115, 116, 46, 0]
>         length = 16
>
> The length property appears to be off between the two Text
> objects, but all the data is correct... any help would be
> greatly appreciated.
>
> Thanks
>
> -jg-
>
>

Reply via email to