PS, The size of the SparseVector is greater than the dense vector for a full vector. I guess something could be done about it.
On Sun, May 2, 2010 at 10:03 PM, Sean Owen <[email protected]> wrote: > That's the one! I actually didn't know this was how PBs did the > variable length encoding but makes sense, it's about the most > efficient thing I can imagine. > > Values up to 16,383 fit in two bytes, which less than a 4-byte int and > the 3 bytes or so it would take the other scheme. Could add up over > thousands of elements times millions of vectors. > > Decoding isn't too slow and if one believes this isn't an unusual > encoding, it's not so problematic to use it in a format that others > outside Mahout may wish to consume. > > On Sun, May 2, 2010 at 5:23 PM, Robin Anil <[email protected]> wrote: > > You mean this type of encoding instead? > > http://code.google.com/apis/protocolbuffers/docs/encoding.html >
