chris, thanks. i see now.
internally, i use String instead of Text and so I use WritableUtils.writeString(...) and not Text.write(...). in the latter method, i see that it calls WritableUtils.writeVInt(...) before out.write(byte[], start, length). tom white uses Text internally to represent strings (which is maybe what i should do), so his example is correct and works. i think i was just confusing myself. thanks for the last paragraph too, that really helped a lot. On Sat, Mar 31, 2012 at 1:17 PM, Chris White <chriswhite...@gmail.com>wrote: > A text object is written out as a vint representing the number of bytes and > then the byte array contents of the text object > > Because a vintage can be between 1-5 bytes in length, the decodeVIntSize > method examines the first byte of the vint to work out how many bytes to > skip over before the text bytes start. > > readVInt then actually reads the vint bytes to get the length of the > following byte array. > > So when you call the compareBytes method you need to pass in where the > actual bytes start (s1 + vIntLen) and how many bytes to compare (vint) > On Mar 31, 2012 12:38 AM, "Jane Wayne" <jane.wayne2...@gmail.com> wrote: > > > in tom white's book, Hadoop, The Definitive Guide, in the second edition, > > on page 99, he shows how to compare the raw bytes of a key with Text > > fields. he shows an example like the following. > > > > int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1); > > int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2); > > > > his explanation is that firstL1 is the length of the first String/Text in > > b1, and firstL2 is the length of the first String/Text in b2. but i'm > > unsure of what the code is actually doing. > > > > what is WritableUtils.decodeVIntSize(...) doing? > > what is WritableComparator.readVInt(...) doing? > > why do we have to add the outputs of these 2 methods to get the length of > > the String/Text? > > > > could someone please explain in plain terms what's happening here? it > seems > > WritableComparator.readVInt(...) is already getting the length of the > > byte[] corresponding to the string. it seems > > WritableUtils.decodeVIntSize(...) is also doing the same thing (from > > reading the javadoc). > > > > when i look at WritableUtils.writeString(...), two things happen. the > > length of the byte[] is written, followed by writing the byte[] itself. > why > > can't we simply do something like the following to get the length? > > > > int firstL1 = readInt(b1[s1]); > > int firstL2 = readInt(b2[s2]); > > >