On Sat, Feb 20, 2010 at 5:25 AM, Robin Anil <robin.a...@gmail.com> wrote:
> +1 for more tests to the Vector implementations. Really, If vectors start > acting weirdly there is no way we can debug a ML algorithm and less so on > top of a distributed system. Like Grant once said, debugging such a system > would result in loss of hair. > > I am ok with pulling out caching optimisations out of vector to another > layer. Its the algorithms responsibility to optimise based on its usage > patterns and for that we need both mutable and immutable operators > something > like v1.plus(v2) immutable > v1.plusMutable(v2); > Last night I added a ton of tests verifying that every mutating method blows away the cache. It's in VectorTest.testGetLengthSquared(). If I'm missing any mutating methods, just add them. We made a mistake in not having coverage of this, but once we have full coverage, this should be fine. -1 to pulling out caching of this. It's not hard to add unit tests to check for this kind of thing. It is indeed a performance win - if you have to call getLengthSquared() often, you're saving 2N operations and replacing with 1. If the vector is kept immutable, and reused a ton (for example, in eigensystem solving), this is humongous. If you're doing clustering, getLengthSquared() is also called a lot. And we do have v1.plus() vs. v1.plusMutable() - the latter is addTo(). -jake