That's the one! I actually didn't know this was how PBs did the variable length encoding but makes sense, it's about the most efficient thing I can imagine.
Values up to 16,383 fit in two bytes, which less than a 4-byte int and the 3 bytes or so it would take the other scheme. Could add up over thousands of elements times millions of vectors. Decoding isn't too slow and if one believes this isn't an unusual encoding, it's not so problematic to use it in a format that others outside Mahout may wish to consume. On Sun, May 2, 2010 at 5:23 PM, Robin Anil <robin.a...@gmail.com> wrote: > You mean this type of encoding instead? > http://code.google.com/apis/protocolbuffers/docs/encoding.html