Re: Profiling SequentialAccessSparseVector

Jake Mannix Sat, 20 Feb 2010 07:25:34 -0800

On Sat, Feb 20, 2010 at 5:25 AM, Robin Anil <robin.a...@gmail.com> wrote:


> +1 for more tests to the Vector implementations. Really, If vectors start
> acting weirdly there is no way we can debug a ML algorithm and less so on
> top of a distributed system.  Like Grant once said, debugging such a system
> would result in loss of hair.
>
> I am ok with pulling out caching optimisations out of vector to another
> layer. Its the algorithms responsibility to optimise based on its usage
> patterns and for that we need both mutable and immutable operators
> something
> like v1.plus(v2) immutable
> v1.plusMutable(v2);
>

Last night I added a ton of tests verifying that every mutating method blows
away
the cache.  It's in VectorTest.testGetLengthSquared().  If I'm missing any
mutating
methods, just add them.  We made a mistake in not having coverage of this,
but
once we have full coverage, this should be fine.

-1 to pulling out caching of this.  It's not hard to add unit tests to check
for this kind
of thing. It is indeed a performance win - if you have to call
getLengthSquared()
often, you're saving 2N operations and replacing with 1.  If the vector is
kept
immutable, and reused a ton (for example, in eigensystem solving), this is
humongous.  If you're doing clustering, getLengthSquared() is also called
a lot.

And we do have v1.plus() vs. v1.plusMutable() - the latter is addTo().

  -jake

Re: Profiling SequentialAccessSparseVector

Reply via email to