I have made all changes take a look. Same could be done for fuzzy kmeans, dirichlet and lda. Havent had time to look at internals yet.
On Fri, Feb 19, 2010 at 3:35 AM, Robin Anil <robin.a...@gmail.com> wrote: > 2 second canopy clustering over reuters :D > > > > On Fri, Feb 19, 2010 at 3:33 AM, Robin Anil <robin.a...@gmail.com> wrote: > >> This really doesnt work for, i cant modify any vectors inside distance >> measure. So i have wrote a subtract inside manhattan distance itself. Works >> great for now >> >> >> On Fri, Feb 19, 2010 at 3:10 AM, Jake Mannix <jake.man...@gmail.com>wrote: >> >>> currentVector.assign(otherVector, minus) takes the other vector, and >>> subtracts >>> it from currentVector, which mutates currentVector. If currentVector is >>> DenseVector, >>> this is already optimized. It could be optimized if currentVector is >>> RandomAccessSparse. >>> >>> -jake >>> >>> On Thu, Feb 18, 2010 at 1:29 PM, Robin Anil <robin.a...@gmail.com> >>> wrote: >>> >>> > Just to be clear, this does: >>> > currentVector-otherVector ? >>> > >>> > currentVector.assign(otherVector, Functions.minus); >>> > >>> > >>> > >>> > On Fri, Feb 19, 2010 at 2:57 AM, Jake Mannix <jake.man...@gmail.com> >>> > wrote: >>> > >>> > > to do subtractFrom, you can instead just do >>> > > >>> > > Vector.assign(otherVector, Functions.minus); >>> > > >>> > > The problem is that while DenseVector has an optimization here: if >>> the >>> > > BinaryFunction passed in is additive (it's an instance of PlusMult), >>> > > sparse iteration over "otherVector" is executed, applying the binary >>> > > function and mutating self. AbstractVector should have this >>> optimization >>> > > in general, as it would be useful in RandomAccessSparseVector >>> (although >>> > > not terribly useful in SequentialAccessSparseVector, but still better >>> > than >>> > > current). >>> > > >>> > > -jake >>> > > >>> > > On Thu, Feb 18, 2010 at 1:19 PM, Robin Anil <robin.a...@gmail.com> >>> > wrote: >>> > > >>> > > > I just had to change it at one place(and the tests pass, which is >>> > scary). >>> > > > Canopy is really fast now :). Still could be pushed >>> > > > Now the bottleneck is minus >>> > > > >>> > > > maybe a subtractFrom on the lines of addTo? or a mutable negate >>> > function >>> > > > for >>> > > > vector, before adding to >>> > > > >>> > > > Robin >>> > > > >>> > > > >>> > > > >>> > > > On Fri, Feb 19, 2010 at 2:43 AM, Jake Mannix < >>> jake.man...@gmail.com> >>> > > > wrote: >>> > > > >>> > > > > I use it (addTo) in decomposer, for exactly this performance >>> issue. >>> > > > > Changing >>> > > > > plus into addTo requires care, because since plus() leaves >>> arguments >>> > > > > immutable, >>> > > > > there may be code which *assumes* that this is the case, and >>> doing >>> > > > addTo() >>> > > > > leaves side effects which might not be expected. This bit me >>> hard on >>> > > svd >>> > > > > migration, because I had other assumptions about mutability in >>> there. >>> > > > > >>> > > > > -jake >>> > > > > >>> > > > > On Thu, Feb 18, 2010 at 1:09 PM, Robin Anil < >>> robin.a...@gmail.com> >>> > > > wrote: >>> > > > > >>> > > > > > ah! Its not being used anywhere :). Should we make that a big >>> task >>> > > > before >>> > > > > > 0.3 ? Sweep through code(mainly clustering) and change all >>> these >>> > > > things. >>> > > > > > >>> > > > > > Robin >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > On Fri, Feb 19, 2010 at 2:36 AM, Sean Owen <sro...@gmail.com> >>> > wrote: >>> > > > > > >>> > > > > > > Isn't this basically what assign() is for? >>> > > > > > > >>> > > > > > > On Thu, Feb 18, 2010 at 9:04 PM, Robin Anil < >>> > robin.a...@gmail.com> >>> > > > > > wrote: >>> > > > > > > > Now the big perf bottle neck is immutability >>> > > > > > > > >>> > > > > > > > Say for plus its doing vector.clone() before doing anything >>> > else. >>> > > > > > > > There should be both immutable and mutable plus functions >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >