PS, The size of the SparseVector is greater than the dense vector for a full
vector. I guess something could be done about it.

On Sun, May 2, 2010 at 10:03 PM, Sean Owen <[email protected]> wrote:

> That's the one! I actually didn't know this was how PBs did the
> variable length encoding but makes sense, it's about the most
> efficient thing I can imagine.
>
> Values up to 16,383 fit in two bytes, which less than a 4-byte int and
> the 3 bytes or so it would take the other scheme. Could add up over
> thousands of elements times millions of vectors.
>
> Decoding isn't too slow and if one believes this isn't an unusual
> encoding, it's not so problematic to use it in a format that others
> outside Mahout may wish to consume.
>
> On Sun, May 2, 2010 at 5:23 PM, Robin Anil <[email protected]> wrote:
> > You mean this type of encoding instead?
> >  http://code.google.com/apis/protocolbuffers/docs/encoding.html
>

Reply via email to