in tom white's book, Hadoop, The Definitive Guide, in the second edition, on page 99, he shows how to compare the raw bytes of a key with Text fields. he shows an example like the following.
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1); int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2); his explanation is that firstL1 is the length of the first String/Text in b1, and firstL2 is the length of the first String/Text in b2. but i'm unsure of what the code is actually doing. what is WritableUtils.decodeVIntSize(...) doing? what is WritableComparator.readVInt(...) doing? why do we have to add the outputs of these 2 methods to get the length of the String/Text? could someone please explain in plain terms what's happening here? it seems WritableComparator.readVInt(...) is already getting the length of the byte[] corresponding to the string. it seems WritableUtils.decodeVIntSize(...) is also doing the same thing (from reading the javadoc). when i look at WritableUtils.writeString(...), two things happen. the length of the byte[] is written, followed by writing the byte[] itself. why can't we simply do something like the following to get the length? int firstL1 = readInt(b1[s1]); int firstL2 = readInt(b2[s2]);