Re: Writables and Inheritance

Ted Dunning Tue, 05 Jan 2010 12:33:26 -0800

"same representation" doesn't have to mean that the representation doesn't
have magic internally.

It just means that if you put the same content into three different kinds of
vectors, you plausibly ought to see roughly the same thing go out the wire.
This is subject to a few caveats like the fact that a dense vector doesn't
really know if it has only a few non-zero elements.  I would be happy if the
serialized form decided that it had lots of non-zeros and thus could do away
with writing all of the indexes.

It might also be that we should write the indexes using a compressed bit
vector format such as a run-length encoding.  That gives low overhead for
very sparse and for very dense vectors.

On Tue, Jan 5, 2010 at 8:38 AM, Jake Mannix <jake.man...@gmail.com> wrote:

> > I would imagine the serialized form of a vector is the same for
> > SparseVector, DenseVector, etc. There's no question of representation.
> > You write out all the non-default elements.
> >
>
> This will be twice as large in the dense case (there's no need to write out
> indices). Ok, not twice as large but size() * (4 + 8) instead of size() *
> 8.
> That's a pretty significant cost in terms of disk space and IO time.

-- 
Ted Dunning, CTO
DeepDyve

Re: Writables and Inheritance

Reply via email to