That's the one! I actually didn't know this was how PBs did the
variable length encoding but makes sense, it's about the most
efficient thing I can imagine.

Values up to 16,383 fit in two bytes, which less than a 4-byte int and
the 3 bytes or so it would take the other scheme. Could add up over
thousands of elements times millions of vectors.

Decoding isn't too slow and if one believes this isn't an unusual
encoding, it's not so problematic to use it in a format that others
outside Mahout may wish to consume.

On Sun, May 2, 2010 at 5:23 PM, Robin Anil <robin.a...@gmail.com> wrote:
> You mean this type of encoding instead?
>  http://code.google.com/apis/protocolbuffers/docs/encoding.html

Reply via email to