Now the big perf bottle neck is immutability

Say for plus its doing vector.clone() before doing anything else.
There should be both immutable and mutable plus functions

Robin



On Fri, Feb 19, 2010 at 2:07 AM, Jake Mannix <jake.man...@gmail.com> wrote:

> I dunno, we can file it for whenever, 0.4 and if it turns out it's a really
> easy
> change we can always commit it for 0.3.
>
>  -jake
>
> On Thu, Feb 18, 2010 at 12:29 PM, Robin Anil <robin.a...@gmail.com> wrote:
>
> > File it for 0.3 ?
> >
> >
> > Robin
> >
> > On Fri, Feb 19, 2010 at 1:56 AM, Jake Mannix <jake.man...@gmail.com>
> > wrote:
> >
> > > On Thu, Feb 18, 2010 at 11:55 AM, Robin Anil <robin.a...@gmail.com>
> > wrote:
> > >
> > > > I was trying out SeqAccessSparseVector on Canopy Clustering using
> > > Manhattan
> > > > distance. I found performance to be really bad. So I profiled it with
> > > > Yourkit(Thanks a lot for providing us free license)
> > > >
> > > > Since i was trying out manhattan distance, there were a lot of A-B
> > which
> > > > created a lot of clone operation 5% of the total time
> > > > there were also so many A+B for adding a point to the canopy to
> > average.
> > > > this was also creating a lot of clone operations.  90% of the total
> > time
> > > >
> > >
> > > SequentialAccessSparseVector should only be used in a read-only
> fashion.
> > >  If
> > > you are creating an average centroid which is sparse, but it is
> mutating,
> > > then it should be RandomAccessSparseVector.  The points which are being
> > > used
> > > to create it can be SequentialAccessSparseVector (if they themselves
> > never
> > > change), but then the method called should be
> > > SequentialAccessSparseVector.addTo(RandomAccessSparseVector) - this
> > > exploits
> > > the fast sequential iteration of SeqAcc, and the fast random-access
> > > mutatability of RandAcc.
> > >
> > >
> > > >
> > > > So we definitely needs to improve that..
> > > >
> > > > For a small hack. I made the cluster centers RandomAccess Vector.
> > Things
> > > > are fast again. I dont know whether to commit or not. But something
> to
> > > look
> > > > into in 0.4?
> > > >
> > >
> > > Yeah, cluster *centers* should indeed be RandomAccess.  JIRA / patch so
> > we
> > > can see exactly what the change is?
> > >
> > >  -jake
> > >
> >
>

Reply via email to