Re: Profiling SequentialAccessSparseVector

Robin Anil Thu, 18 Feb 2010 14:25:41 -0800

I have made all changes take a look. Same could be done for fuzzy kmeans,
dirichlet and lda. Havent had time to look at internals yet.




On Fri, Feb 19, 2010 at 3:35 AM, Robin Anil <robin.a...@gmail.com> wrote:

> 2 second canopy clustering over reuters :D
>
>
>
> On Fri, Feb 19, 2010 at 3:33 AM, Robin Anil <robin.a...@gmail.com> wrote:
>
>> This really doesnt work for, i cant modify any vectors inside distance
>> measure. So i have wrote a subtract inside manhattan distance itself. Works
>> great for now
>>
>>
>> On Fri, Feb 19, 2010 at 3:10 AM, Jake Mannix <jake.man...@gmail.com>wrote:
>>
>>> currentVector.assign(otherVector, minus) takes the other vector, and
>>> subtracts
>>> it from currentVector, which mutates currentVector.  If currentVector is
>>> DenseVector,
>>> this is already optimized.  It could be optimized if currentVector is
>>> RandomAccessSparse.
>>>
>>>  -jake
>>>
>>> On Thu, Feb 18, 2010 at 1:29 PM, Robin Anil <robin.a...@gmail.com>
>>> wrote:
>>>
>>> > Just to be clear, this does:
>>> > currentVector-otherVector ?
>>> >
>>> > currentVector.assign(otherVector, Functions.minus);
>>> >
>>> >
>>> >
>>> > On Fri, Feb 19, 2010 at 2:57 AM, Jake Mannix <jake.man...@gmail.com>
>>> > wrote:
>>> >
>>> > > to do subtractFrom, you can instead just do
>>> > >
>>> > >  Vector.assign(otherVector, Functions.minus);
>>> > >
>>> > > The problem is that while DenseVector has an optimization here: if
>>> the
>>> > > BinaryFunction passed in is additive (it's an instance of PlusMult),
>>> > > sparse iteration over "otherVector" is executed, applying the binary
>>> > > function and mutating self.  AbstractVector should have this
>>> optimization
>>> > > in general, as it would be useful in RandomAccessSparseVector
>>> (although
>>> > > not terribly useful in SequentialAccessSparseVector, but still better
>>> > than
>>> > > current).
>>> > >
>>> > >  -jake
>>> > >
>>> > > On Thu, Feb 18, 2010 at 1:19 PM, Robin Anil <robin.a...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > I just had to change it at one place(and the tests pass, which is
>>> > scary).
>>> > > > Canopy is really fast now :). Still could be pushed
>>> > > > Now the bottleneck is minus
>>> > > >
>>> > > > maybe a subtractFrom on the lines of addTo? or a mutable negate
>>> > function
>>> > > > for
>>> > > > vector, before adding to
>>> > > >
>>> > > > Robin
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Fri, Feb 19, 2010 at 2:43 AM, Jake Mannix <
>>> jake.man...@gmail.com>
>>> > > > wrote:
>>> > > >
>>> > > > > I use it (addTo) in decomposer, for exactly this performance
>>> issue.
>>> > > > > Changing
>>> > > > > plus into addTo requires care, because since plus() leaves
>>> arguments
>>> > > > > immutable,
>>> > > > > there may be code which *assumes* that this is the case, and
>>> doing
>>> > > > addTo()
>>> > > > > leaves side effects which might not be expected.  This bit me
>>> hard on
>>> > > svd
>>> > > > > migration, because I had other assumptions about mutability in
>>> there.
>>> > > > >
>>> > > > >  -jake
>>> > > > >
>>> > > > > On Thu, Feb 18, 2010 at 1:09 PM, Robin Anil <
>>> robin.a...@gmail.com>
>>> > > > wrote:
>>> > > > >
>>> > > > > > ah! Its not being used anywhere :). Should we make that a big
>>> task
>>> > > > before
>>> > > > > > 0.3 ? Sweep through code(mainly clustering) and change all
>>> these
>>> > > > things.
>>> > > > > >
>>> > > > > > Robin
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > On Fri, Feb 19, 2010 at 2:36 AM, Sean Owen <sro...@gmail.com>
>>> > wrote:
>>> > > > > >
>>> > > > > > > Isn't this basically what assign() is for?
>>> > > > > > >
>>> > > > > > > On Thu, Feb 18, 2010 at 9:04 PM, Robin Anil <
>>> > robin.a...@gmail.com>
>>> > > > > > wrote:
>>> > > > > > > > Now the big perf bottle neck is immutability
>>> > > > > > > >
>>> > > > > > > > Say for plus its doing vector.clone() before doing anything
>>> > else.
>>> > > > > > > > There should be both immutable and mutable plus functions
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Profiling SequentialAccessSparseVector

Reply via email to