Push aggregate() too, but the results still don't look that great.

On Sun, Apr 28, 2013 at 9:10 PM, Dan Filimon <[email protected]>wrote:

> Yeah. It's there, but if you wait for a bit before having a look I'll also
> have results from aggregate().
>
>
> On Sun, Apr 28, 2013 at 9:08 PM, Robin Anil <[email protected]> wrote:
>
>> Do you have the code committed in your repo?
>>
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>
>>
>> On Sun, Apr 28, 2013 at 4:23 AM, Dan Filimon <[email protected]
>> >wrote:
>>
>> > Here's [1] the most recent benchmarks, after refactoring the Assign into
>> > separate classes.
>> > It's now using a 10 second runtime and a 1 second lead time and these
>> are
>> > only the benchmarks involving assign().
>> >
>> > Oddly enough there are regressions with RandomAccessSparseVectors'
>> plus()
>> > and minus() despite these working like trunk from what I can tell from
>> the
>> > tests (VectorBinaryAssignTest).
>> >
>> > Also, there's quite a bit of variance for some reason... Clone, Create
>>  are
>> > just two of the odd ones.
>> >
>> > [1]
>> >
>> >
>> https://docs.google.com/spreadsheet/ccc?key=0AochdzPoBmWodG9RTms1UG40YlNQd3ByUFpQY0FLWmc#gid=4
>> >
>> >
>> > On Fri, Apr 26, 2013 at 1:30 AM, Robin Anil <[email protected]>
>> wrote:
>> >
>> > > Do in place for rand an dense. Assign a bug to me to speed up rasv
>> > > GetElement.
>> > > On Apr 25, 2013 2:56 PM, "Dan Filimon" <[email protected]>
>> > > wrote:
>> > >
>> > > > Nearly done splitting the code up, but I'm not sure what the costs
>> > should
>> > > > ideally be.
>> > > >
>> > > > Robin, you proposed: cost of iteration + cost of lookup + cost of
>> > update
>> > > > (if its not in-place)
>> > > >
>> > > > This sounds like it's per element, rather than for the entire
>> vector.
>> > > Also,
>> > > > are we just going to assume that there will be as many updates as
>> > > non-zeros
>> > > > or as large as the array is? How does the cost of an update factor
>> in?
>> > > >
>> > > > And, is "in-place" just for DenseVectors?
>> > > >
>> > > >
>> > > > On Thu, Apr 25, 2013 at 8:04 PM, Robin Anil <[email protected]>
>> > > wrote:
>> > > >
>> > > > > Depends on the speed of copying a 1M double array v/s doing a
>> calloc
>> > +
>> > > > > copying 1000 non zeros (Assuming java is doing that underneath).
>> > > > >
>> > > > > ------
>> > > > > Robin Anil
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Apr 25, 2013 at 2:18 AM, Dan Filimon <
>> > > > [email protected]
>> > > > > >wrote:
>> > > > >
>> > > > > > Right, but is clone() generally slower than assigning? That
>> strikes
>> > > me
>> > > > as
>> > > > > > odd; doesn't Java optimize copying the internal structures
>> (there
>> > are
>> > > > > > arrays underneath after all)?
>> > > > > >
>> > > > > >
>> > > > > > On Thu, Apr 25, 2013 at 10:14 AM, Robin Anil <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > >> Seems like for dense clone is slower than like().assign I need
>> to
>> > > test
>> > > > > it
>> > > > > >> with different sizes to be sure. I kept it from the old
>> behavior.
>> > > > > >> On Apr 25, 2013 2:12 AM, "Dan Filimon" <
>> > [email protected]
>> > > >
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >>> Okay, so I should split it further into smaller sub-cases that
>> > > handle
>> > > > > >>> each Vector type. I tried making so that these match the
>> cases in
>> > > the
>> > > > > >>> document to the extent possible.
>> > > > > >>> You're right. It is ugly and I need to split it up.
>> > > > > >>>
>> > > > > >>> I removed the OrderedIntDoubleMapping (but with another if...)
>> > but
>> > > > one
>> > > > > >>> more thing that helped is making the assign() only go through
>> the
>> > > > > non-zeros
>> > > > > >>> when setting up a new copy.
>> > > > > >>>
>> > > > > >>> About the copying, I saw that the optimized copy doesn't
>> always
>> > > call
>> > > > > >>> clone clone on the Vector. Why is this?
>> > > > > >>>
>> > > > > >>> Thanks for the test and the feedback! :)
>> > > > > >>>
>> > > > > >>>
>> > > > > >>> On Thu, Apr 25, 2013 at 8:30 AM, Robin Anil <
>> > [email protected]
>> > > > > >wrote:
>> > > > > >>>
>> > > > > >>>> Yes its reaching dev. I checked out your code to take a
>> look. I
>> > > > > >>>> understand you are trying to still mix isSequentialAccess
>> etc in
>> > > the
>> > > > > >>>> validity calculation. The whole point of this architecture
>> is to
>> > > > > completely
>> > > > > >>>> unroll things to be more modular instead of creating a giant
>> > > > spaghetti
>> > > > > >>>>
>> > > > > >>>> Here is an example of what I am trying to say.
>> > > > > >>>>
>> > > > > >>>> This is your current implementation of
>> > > AssignIterateOneLookupOther.
>> > > > > >>>> There are so many conditionals
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>>    1. @Override
>> > > > > >>>>    2.     public Vector assign(Vector x, Vector y,
>> > > > > >>>>    DoubleDoubleFunction f) {
>> > > > > >>>>    3.       return !swap ? assignInner(x, y, f) :
>> assignInner(y,
>> > > x,
>> > > > > f);
>> > > > > >>>>    4.     }
>> > > > > >>>>    5.
>> > > > > >>>>    6.     public Vector assignInner(Vector x, Vector y,
>> > > > > >>>>    DoubleDoubleFunction f) {
>> > > > > >>>>    7.       Iterator<Vector.Element> xi = x.iterateNonZero();
>> > > > > >>>>    8.       Vector.Element xe;
>> > > > > >>>>    9.       Vector.Element ye;
>> > > > > >>>>    10.       OrderedIntDoubleMapping updates =
>> > > > > newOrderedIntDoubleMapping();
>> > > > > >>>>    11.       while (xi.hasNext()) {
>> > > > > >>>>    12.         xe = xi.next();
>> > > > > >>>>    13.         ye = y.getElement(xe.index());
>> > > > > >>>>    14.         if (!swap) {
>> > > > > >>>>    15.           xe.set(f.apply(xe.get(), ye.get()));
>> > > > > >>>>    16.         } else {
>> > > > > >>>>    17.           if (ye.get() != 0.0 ||
>> y.isAddConstantTime()) {
>> > > > > >>>>    18.             ye.set(f.apply(ye.get(), xe.get()));
>> > > > > >>>>    19.           } else {
>> > > > > >>>>    20.             updates.set(xe.index(), f.apply(ye.get(),
>> > > > > >>>>    xe.get()));
>> > > > > >>>>    21.           }
>> > > > > >>>>    22.         }
>> > > > > >>>>    23.       }
>> > > > > >>>>    24.       if (swap && !y.isAddConstantTime()) {
>> > > > > >>>>    25.         y.mergeUpdates(updates);
>> > > > > >>>>    26.       }
>> > > > > >>>>    27.       return swap ? y : x;
>> > > > > >>>>    28.     }
>> > > > > >>>>    29.   }
>> > > > > >>>>
>> > > > > >>>> Split this into.
>> > > > > >>>>
>> > > > > >>>> IterateThisLookupThatInplaceUpdate
>> > > > > >>>> IterateThatLookupThisInplaceUpdate
>> > > > > >>>> IterateThisLookupThatMergeUpdate
>> > > > > >>>> IterateThatLookupThisMergeUpdate
>> > > > > >>>>
>> > > > > >>>> Removing the conditionals itself will speedup the operations
>> you
>> > > see
>> > > > > >>>> regression on.
>> > > > > >>>>
>> > > > > >>>> Also note that getElement() is not constant Add time for
>> RASV.
>> > In
>> > > > > >>>> earlier version we tried at best to set in place. That
>> reduced
>> > the
>> > > > > extra
>> > > > > >>>> hashkey computation. getElement in AbstractVector is really
>> > > > optimized
>> > > > > for
>> > > > > >>>> DenseVector and it does indicate as the cause of regression
>> you
>> > > are
>> > > > > seeing
>> > > > > >>>> in RASV. Another one is the unused new
>> > OrderedIntDoubleMapping().
>> > > > > Removing
>> > > > > >>>> all these will bring back the loss.
>> > > > > >>>>
>> > > > > >>>> Now the cost for each can be computed as: cost of iteration +
>> > cost
>> > > > of
>> > > > > >>>> lookup + cost of update (if its not in-place)
>> > > > > >>>>
>> > > > > >>>> See the attached file for a mock test. You will have to
>> rework
>> > the
>> > > > > >>>> expectations but the framework should be understandable.
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> ------
>> > > > > >>>> Robin Anil
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> On Tue, Apr 23, 2013 at 4:09 AM, Dan Filimon <
>> > > > > >>>> [email protected]> wrote:
>> > > > > >>>>
>> > > > > >>>>> Here's [1] a link to the "design document" of the new vector
>> > > > > >>>>> operations so I can lay out the ideas behind what I'm doing
>> > more
>> > > > > clearly.
>> > > > > >>>>>
>> > > > > >>>>> I'd like anyone who can to have a look and comment.
>> > > > > >>>>>
>> > > > > >>>>> Ted,
>> > > > > >>>>> I know you'd like me to get back to working on clustering,
>> but
>> > > > > >>>>> currently, BallKMeans is roughly 2x as slow as Mahout's
>> KMeans.
>> > > > > >>>>> This is because of centroid.update() for one (it doesn't
>> have
>> > > > > >>>>> "interesting" properties that are exploited in the current)
>> and
>> > > > also
>> > > > > >>>>> because we're using we're just doing lots of Vector
>> operations:
>> > > dot
>> > > > > >>>>> products for projections, hash codes for unique sets of
>> > > candidates,
>> > > > > >>>>> equality checks etc.
>> > > > > >>>>>
>> > > > > >>>>> Here's a snapshot of the hotspots for one run of BallKMeans
>> (20
>> > > > > >>>>> newsgroups, unprojected, i.e 20K sparse vectors with 200K
>> > > > dimensions
>> > > > > and
>> > > > > >>>>> ~100 nonzero values):
>> > > > > >>>>>   Name Time (ms)
>> > >  org.apache.mahout.math.AbstractVector.dot(Vector)
>> > > > > >>>>> 104069  org.apache.mahout.math.DelegatingVector.hashCode()
>> > 30159
>> > > > >  org.apache.mahout.math.AbstractVector.assignIterateBoth(Iterator,
>> > > > > >>>>> Iterator, DoubleDoubleFunction) 13720
>> > > > > >>>>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.mahout.math.OrderedIntDoubleMapping.merge(OrderedIntDoubleMapping)
>> > > > > >>>>> 2911  java.lang.ClassLoader.loadClass(String, boolean) 2620
>> > > > >  sun.misc.Launcher$AppClassLoader.loadClass(String,
>> > > > > >>>>> boolean) 2620
>> > > > > >>>>>
>> > > > > >>>>>
>> > > > > >>>>> You can see that the most time is spent doing Vector
>> > operations.
>> > > > > >>>>> Combined, the first three rows account for 91% of the
>> runtime.
>> > > > > >>>>>
>> > > > > >>>>> Thanks for your help!
>> > > > > >>>>>
>> > > > > >>>>> PS: Is this reaching dev@?
>> > > > > >>>>>
>> > > > > >>>>> [1]
>> > > > > >>>>>
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1g1PjUuvjyh2LBdq2_rKLIcUiDbeOORA1sCJiSsz-JVU/edit#
>> > > > > >>>>>
>> > > > > >>>>>
>> > > > > >>>>> On Tue, Apr 23, 2013 at 7:53 AM, Robin Anil <
>> > > [email protected]
>> > > > > >wrote:
>> > > > > >>>>>
>> > > > > >>>>>> Few more brain dump ideas
>> > > > > >>>>>>
>> > > > > >>>>>> For any operation (assign, aggregate) you are trying to do.
>> > > > > >>>>>> Compute set of "feasible" solutions: Based on all the
>> > properties
>> > > > of
>> > > > > >>>>>> functions
>> > > > > >>>>>> Choose the solution that minimises the cost.
>> > > > > >>>>>>
>> > > > > >>>>>> For this you may need to refactor all the special kinds of
>> > > > > iterations
>> > > > > >>>>>> and aggregations subclassing from AssignOperation and
>> > > > > AggregateOperation.
>> > > > > >>>>>> Each operation would return the cost given two input
>> vectors
>> > > > (based
>> > > > > on
>> > > > > >>>>>> their cost properties)
>> > > > > >>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>> ------
>> > > > > >>>>>> Robin Anil
>> > > > > >>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>> On Mon, Apr 22, 2013 at 8:46 PM, Robin Anil <
>> > > [email protected]
>> > > > > >wrote:
>> > > > > >>>>>>
>> > > > > >>>>>>> Strike that. I think the code is very confusing. You will
>> > have
>> > > > > >>>>>>> better luck if you create method for the three
>> > implementations.
>> > > > It
>> > > > > will
>> > > > > >>>>>>> make things explicit and easy to debug.
>> > > > > >>>>>>>
>> > > > > >>>>>>> public final double getIterationCost() {
>> > > > > >>>>>>>  return one of {cardinality, 1.2*numNonDefault,
>> > numNonDefault}
>> > > > > >>>>>>> }
>> > > > > >>>>>>>
>> > > > > >>>>>>> public final double getLookupCost() {
>> > > > > >>>>>>>  return one of {1, 1.5(lets say), log_2(numNonDefault)}
>> > > > > >>>>>>> }
>> > > > > >>>>>>>
>> > > > > >>>>>>> All your logic then becomes a minimization of the total
>> cost.
>> > > > > >>>>>>>
>> > > > > >>>>>>> This is more extendable than current isDense(),
>> > isSequential(),
>> > > > > >>>>>>> isSparse(), isRandom() and doesn't assume what the
>> underlying
>> > > > > >>>>>>> implementation is.
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> ------
>> > > > > >>>>>>> Robin Anil
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> On Mon, Apr 22, 2013 at 8:09 PM, Robin Anil <
>> > > > [email protected]
>> > > > > >wrote:
>> > > > > >>>>>>>
>> > > > > >>>>>>>>  Also the calculation for cost is wrong. RASV is HashMap
>> > based
>> > > > its
>> > > > > >>>>>>>> cost O(n) not O(log n)
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> ------
>> > > > > >>>>>>>> Robin Anil
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> On Mon, Apr 22, 2013 at 4:43 PM, Robin Anil <
>> > > > [email protected]
>> > > > > >wrote:
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>>    This is an automatically generated e-mail. To reply,
>> > > visit:
>> > > > > >>>>>>>>> https://reviews.apache.org/r/10669/
>> > > > > >>>>>>>>>
>> > > >  math/src/main/java/org/apache/mahout/math/AbstractVector.java<
>> > > > >
>> > >
>> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line56>
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> None
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>    55
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>     boolean commutativeAndAssociative =
>> > > > > aggregator.isCommutative() && aggregator.isAssociative();
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   This should be a method in AbstractFunction interface
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > >  math/src/main/java/org/apache/mahout/math/AbstractVector.java<
>> > > > >
>> > > >
>> > >
>> >
>> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1037
>> >
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 726,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 927,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>    942
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>       // Make all the non-zero values 0.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   vector.clear()
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > >  math/src/main/java/org/apache/mahout/math/AbstractVector.java<
>> > > > >
>> > > >
>> > >
>> >
>> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1047
>> >
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 726,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 927,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>    952
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>       while (it.hasNext()) {
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   There is no guarantee of ordered iteration. The
>> updates
>> > is
>> > > > > unsorted. merge typically implies that input is sorted
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > >  math/src/main/java/org/apache/mahout/math/AbstractVector.java<
>> > > > >
>> > > >
>> > >
>> >
>> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1074
>> >
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 726,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 927,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>    979
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>         element.set(values[index]);
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   What if there are values that are not in the existing
>> > > data. I
>> > > > > dont this is correct
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > >  math/src/main/java/org/apache/mahout/math/AbstractVector.java<
>> > > > >
>> > > >
>> > >
>> >
>> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1100
>> >
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 726,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line':
>> > 927,
>> > > > > 'expand_offset': 2}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>    999
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>     invalidateCachedLength();
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   move invalidateCachedLength to the set method.
>> Instead of
>> > > > > dispersing it everywhere.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > >  math/src/main/java/org/apache/mahout/math/AbstractVector.java<
>> > > > >
>> > > >
>> > >
>> >
>> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1115
>> >
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> None
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': '  private Vector
>> > > assignIterateBoth(Iterator<Element>
>> > > > > thisIterator, Iterator<Element> thatIterator,', 'line': 1070}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>     1014
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   protected Vector
>> > > assignSkipZerosIterateOne(Iterator<Element>
>> > > > > thisIterator, Vector other,
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   be consistent, this that or thisIterator that
>> Iterator.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > >  math/src/main/java/org/apache/mahout/math/AbstractVector.java<
>> > > > >
>> > > >
>> > >
>> >
>> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1171
>> >
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> None
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': '  private Vector
>> > > assignIterateBoth(Iterator<Element>
>> > > > > thisIterator, Iterator<Element> thatIterator,', 'line': 1070}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>     1070
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   private Vector assignIterateBoth(Iterator<Element>
>> > > > > thisIterator, Iterator<Element> thatIterator,
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   This expects the data to to be sorted. indicate that
>> in a
>> > > > > comment. Similarly comment other places
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > math/src/main/java/org/apache/mahout/math/function/Functions.java<
>> > > > >
>> > >
>> https://reviews.apache.org/r/10669/diff/2/?file=283202#file283202line900
>> > > > >
>> > > > > (Diff
>> > > > > >>>>>>>>> revision 2)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> None
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> {'text': '      public boolean isLikeLeftMult() {',
>> 'line':
>> > > > 892}
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>    896
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>       /**
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   Move this comments to the base class
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> - Robin
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> On April 22nd, 2013, 8:45 p.m., Dan Filimon wrote:
>> > > > > >>>>>>>>>   Review request for mahout, Ted Dunning, Sebastian
>> > Schelter,
>> > > > and
>> > > > > >>>>>>>>> Robin Anil.
>> > > > > >>>>>>>>> By Dan Filimon.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> *Updated April 22, 2013, 8:45 p.m.*
>> > > > > >>>>>>>>> Description
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> This patch contains code cleaning up AbstractVector and
>> > > making
>> > > > > the operations as fast as possible while still having a high level
>> > > > > interface.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> The main changes are in AbstractVector as well as new
>> > methods
>> > > > in
>> > > > > DoubleDoubleFunction.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   Testing
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> The vectors test pass but it's likely that the patch in
>> > it's
>> > > > > current state is broken as other, unrelated tests (BallKMeans...)
>> are
>> > > > > failing.
>> > > > > >>>>>>>>> Also, my Hadoop conf is broken so I didn't run all the
>> core
>> > > > > tests. Anyone?
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> I can't seem to find the bug, so _please_ have a closer
>> > look.
>> > > > > It's still a work in progress.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> The benchmarks seem comparable (although there are some
>> > > jarring
>> > > > > diferences – Minkowski distance seems a lot slower in new-dan-1
>> than
>> > > > > old-trunk-2). It may be however that this is just variance due to
>> the
>> > > > load
>> > > > > of the machine at the time. I'm having trouble interpreting the
>> > > > benchmarks
>> > > > > in general, so anyone who could give me a hand is more than
>> welcome.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>   Diffs
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/AbstractMatrix.java
>> > > > > >>>>>>>>>    (e12aa38)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java
>> > > > > >>>>>>>>>    (090aa7a)
>> > > > > >>>>>>>>>    -
>> > math/src/main/java/org/apache/mahout/math/Centroid.java
>> > > > > >>>>>>>>>    (0c42196)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/ConstantVector.java
>> > > > > >>>>>>>>>    (51d67d4)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/DelegatingVector.java
>> > > > > >>>>>>>>>    (12220d4)
>> > > > > >>>>>>>>>    -
>> > > math/src/main/java/org/apache/mahout/math/DenseVector.java
>> > > > > >>>>>>>>>    (41c356b)
>> > > > > >>>>>>>>>    -
>> > > > >
>> > > >
>> > >
>> >
>> math/src/main/java/org/apache/mahout/math/FileBasedSparseBinaryMatrix.java
>> > > > > >>>>>>>>>    (094003b)
>> > > > > >>>>>>>>>    -
>> > > math/src/main/java/org/apache/mahout/math/MatrixSlice.java
>> > > > > >>>>>>>>>    (7f79c96)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/MatrixVectorView.java
>> > > > > >>>>>>>>>    (af70727)
>> > > > > >>>>>>>>>    -
>> > > math/src/main/java/org/apache/mahout/math/NamedVector.java
>> > > > > >>>>>>>>>    (4b7a41d)
>> > > > > >>>>>>>>>    -
>> > > > >
>> > math/src/main/java/org/apache/mahout/math/OrderedIntDoubleMapping.java
>> > > > > >>>>>>>>>    (650d82d)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/PermutedVectorView.java
>> > > > > >>>>>>>>>    (d1ea93a)
>> > > > > >>>>>>>>>    -
>> > > > >
>> > math/src/main/java/org/apache/mahout/math/RandomAccessSparseVector.java
>> > > > > >>>>>>>>>    (6f85692)
>> > > > > >>>>>>>>>    -
>> > > > >
>> > > >
>> > >
>> >
>> math/src/main/java/org/apache/mahout/math/SequentialAccessSparseVector.java
>> > > > > >>>>>>>>>    (21982f9)
>> > > > > >>>>>>>>>    -
>> math/src/main/java/org/apache/mahout/math/Vector.java
>> > > > > >>>>>>>>>    (2f8b417)
>> > > > > >>>>>>>>>    -
>> > > math/src/main/java/org/apache/mahout/math/VectorView.java
>> > > > > >>>>>>>>>    (add2a60)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/WeightedVector.java
>> > > > > >>>>>>>>>    (06fbd05)
>> > > > > >>>>>>>>>    -
>> > > > >
>> > > >
>> > >
>> >
>> math/src/main/java/org/apache/mahout/math/function/DoubleDoubleFunction.java
>> > > > > >>>>>>>>>    (82b350a)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/function/Functions.java
>> > > > > >>>>>>>>>    (eb2e42f)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/function/PlusMult.java
>> > > > > >>>>>>>>>    (60587b1)
>> > > > > >>>>>>>>>    -
>> > > > >
>> math/src/main/java/org/apache/mahout/math/function/TimesFunction.java
>> > > > > >>>>>>>>>    (8ab0bb1)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/main/java/org/apache/mahout/math/jet/math/Constants.java
>> > > > > >>>>>>>>>    (53535d6)
>> > > > > >>>>>>>>>    -
>> > > > > math/src/test/java/org/apache/mahout/math/AbstractVectorTest.java
>> > > > > >>>>>>>>>    (2b11199)
>> > > > > >>>>>>>>>    -
>> > > math/src/test/java/org/apache/mahout/math/VectorTest.java
>> > > > > >>>>>>>>>    (d6d554b)
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> View Diff <https://reviews.apache.org/r/10669/diff/>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to