Push aggregate() too, but the results still don't look that great.
On Sun, Apr 28, 2013 at 9:10 PM, Dan Filimon <[email protected]>wrote: > Yeah. It's there, but if you wait for a bit before having a look I'll also > have results from aggregate(). > > > On Sun, Apr 28, 2013 at 9:08 PM, Robin Anil <[email protected]> wrote: > >> Do you have the code committed in your repo? >> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >> >> >> On Sun, Apr 28, 2013 at 4:23 AM, Dan Filimon <[email protected] >> >wrote: >> >> > Here's [1] the most recent benchmarks, after refactoring the Assign into >> > separate classes. >> > It's now using a 10 second runtime and a 1 second lead time and these >> are >> > only the benchmarks involving assign(). >> > >> > Oddly enough there are regressions with RandomAccessSparseVectors' >> plus() >> > and minus() despite these working like trunk from what I can tell from >> the >> > tests (VectorBinaryAssignTest). >> > >> > Also, there's quite a bit of variance for some reason... Clone, Create >> are >> > just two of the odd ones. >> > >> > [1] >> > >> > >> https://docs.google.com/spreadsheet/ccc?key=0AochdzPoBmWodG9RTms1UG40YlNQd3ByUFpQY0FLWmc#gid=4 >> > >> > >> > On Fri, Apr 26, 2013 at 1:30 AM, Robin Anil <[email protected]> >> wrote: >> > >> > > Do in place for rand an dense. Assign a bug to me to speed up rasv >> > > GetElement. >> > > On Apr 25, 2013 2:56 PM, "Dan Filimon" <[email protected]> >> > > wrote: >> > > >> > > > Nearly done splitting the code up, but I'm not sure what the costs >> > should >> > > > ideally be. >> > > > >> > > > Robin, you proposed: cost of iteration + cost of lookup + cost of >> > update >> > > > (if its not in-place) >> > > > >> > > > This sounds like it's per element, rather than for the entire >> vector. >> > > Also, >> > > > are we just going to assume that there will be as many updates as >> > > non-zeros >> > > > or as large as the array is? How does the cost of an update factor >> in? >> > > > >> > > > And, is "in-place" just for DenseVectors? >> > > > >> > > > >> > > > On Thu, Apr 25, 2013 at 8:04 PM, Robin Anil <[email protected]> >> > > wrote: >> > > > >> > > > > Depends on the speed of copying a 1M double array v/s doing a >> calloc >> > + >> > > > > copying 1000 non zeros (Assuming java is doing that underneath). >> > > > > >> > > > > ------ >> > > > > Robin Anil >> > > > > >> > > > > >> > > > > >> > > > > On Thu, Apr 25, 2013 at 2:18 AM, Dan Filimon < >> > > > [email protected] >> > > > > >wrote: >> > > > > >> > > > > > Right, but is clone() generally slower than assigning? That >> strikes >> > > me >> > > > as >> > > > > > odd; doesn't Java optimize copying the internal structures >> (there >> > are >> > > > > > arrays underneath after all)? >> > > > > > >> > > > > > >> > > > > > On Thu, Apr 25, 2013 at 10:14 AM, Robin Anil < >> [email protected] >> > > >> > > > > wrote: >> > > > > > >> > > > > >> Seems like for dense clone is slower than like().assign I need >> to >> > > test >> > > > > it >> > > > > >> with different sizes to be sure. I kept it from the old >> behavior. >> > > > > >> On Apr 25, 2013 2:12 AM, "Dan Filimon" < >> > [email protected] >> > > > >> > > > > >> wrote: >> > > > > >> >> > > > > >>> Okay, so I should split it further into smaller sub-cases that >> > > handle >> > > > > >>> each Vector type. I tried making so that these match the >> cases in >> > > the >> > > > > >>> document to the extent possible. >> > > > > >>> You're right. It is ugly and I need to split it up. >> > > > > >>> >> > > > > >>> I removed the OrderedIntDoubleMapping (but with another if...) >> > but >> > > > one >> > > > > >>> more thing that helped is making the assign() only go through >> the >> > > > > non-zeros >> > > > > >>> when setting up a new copy. >> > > > > >>> >> > > > > >>> About the copying, I saw that the optimized copy doesn't >> always >> > > call >> > > > > >>> clone clone on the Vector. Why is this? >> > > > > >>> >> > > > > >>> Thanks for the test and the feedback! :) >> > > > > >>> >> > > > > >>> >> > > > > >>> On Thu, Apr 25, 2013 at 8:30 AM, Robin Anil < >> > [email protected] >> > > > > >wrote: >> > > > > >>> >> > > > > >>>> Yes its reaching dev. I checked out your code to take a >> look. I >> > > > > >>>> understand you are trying to still mix isSequentialAccess >> etc in >> > > the >> > > > > >>>> validity calculation. The whole point of this architecture >> is to >> > > > > completely >> > > > > >>>> unroll things to be more modular instead of creating a giant >> > > > spaghetti >> > > > > >>>> >> > > > > >>>> Here is an example of what I am trying to say. >> > > > > >>>> >> > > > > >>>> This is your current implementation of >> > > AssignIterateOneLookupOther. >> > > > > >>>> There are so many conditionals >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> 1. @Override >> > > > > >>>> 2. public Vector assign(Vector x, Vector y, >> > > > > >>>> DoubleDoubleFunction f) { >> > > > > >>>> 3. return !swap ? assignInner(x, y, f) : >> assignInner(y, >> > > x, >> > > > > f); >> > > > > >>>> 4. } >> > > > > >>>> 5. >> > > > > >>>> 6. public Vector assignInner(Vector x, Vector y, >> > > > > >>>> DoubleDoubleFunction f) { >> > > > > >>>> 7. Iterator<Vector.Element> xi = x.iterateNonZero(); >> > > > > >>>> 8. Vector.Element xe; >> > > > > >>>> 9. Vector.Element ye; >> > > > > >>>> 10. OrderedIntDoubleMapping updates = >> > > > > newOrderedIntDoubleMapping(); >> > > > > >>>> 11. while (xi.hasNext()) { >> > > > > >>>> 12. xe = xi.next(); >> > > > > >>>> 13. ye = y.getElement(xe.index()); >> > > > > >>>> 14. if (!swap) { >> > > > > >>>> 15. xe.set(f.apply(xe.get(), ye.get())); >> > > > > >>>> 16. } else { >> > > > > >>>> 17. if (ye.get() != 0.0 || >> y.isAddConstantTime()) { >> > > > > >>>> 18. ye.set(f.apply(ye.get(), xe.get())); >> > > > > >>>> 19. } else { >> > > > > >>>> 20. updates.set(xe.index(), f.apply(ye.get(), >> > > > > >>>> xe.get())); >> > > > > >>>> 21. } >> > > > > >>>> 22. } >> > > > > >>>> 23. } >> > > > > >>>> 24. if (swap && !y.isAddConstantTime()) { >> > > > > >>>> 25. y.mergeUpdates(updates); >> > > > > >>>> 26. } >> > > > > >>>> 27. return swap ? y : x; >> > > > > >>>> 28. } >> > > > > >>>> 29. } >> > > > > >>>> >> > > > > >>>> Split this into. >> > > > > >>>> >> > > > > >>>> IterateThisLookupThatInplaceUpdate >> > > > > >>>> IterateThatLookupThisInplaceUpdate >> > > > > >>>> IterateThisLookupThatMergeUpdate >> > > > > >>>> IterateThatLookupThisMergeUpdate >> > > > > >>>> >> > > > > >>>> Removing the conditionals itself will speedup the operations >> you >> > > see >> > > > > >>>> regression on. >> > > > > >>>> >> > > > > >>>> Also note that getElement() is not constant Add time for >> RASV. >> > In >> > > > > >>>> earlier version we tried at best to set in place. That >> reduced >> > the >> > > > > extra >> > > > > >>>> hashkey computation. getElement in AbstractVector is really >> > > > optimized >> > > > > for >> > > > > >>>> DenseVector and it does indicate as the cause of regression >> you >> > > are >> > > > > seeing >> > > > > >>>> in RASV. Another one is the unused new >> > OrderedIntDoubleMapping(). >> > > > > Removing >> > > > > >>>> all these will bring back the loss. >> > > > > >>>> >> > > > > >>>> Now the cost for each can be computed as: cost of iteration + >> > cost >> > > > of >> > > > > >>>> lookup + cost of update (if its not in-place) >> > > > > >>>> >> > > > > >>>> See the attached file for a mock test. You will have to >> rework >> > the >> > > > > >>>> expectations but the framework should be understandable. >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> ------ >> > > > > >>>> Robin Anil >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> On Tue, Apr 23, 2013 at 4:09 AM, Dan Filimon < >> > > > > >>>> [email protected]> wrote: >> > > > > >>>> >> > > > > >>>>> Here's [1] a link to the "design document" of the new vector >> > > > > >>>>> operations so I can lay out the ideas behind what I'm doing >> > more >> > > > > clearly. >> > > > > >>>>> >> > > > > >>>>> I'd like anyone who can to have a look and comment. >> > > > > >>>>> >> > > > > >>>>> Ted, >> > > > > >>>>> I know you'd like me to get back to working on clustering, >> but >> > > > > >>>>> currently, BallKMeans is roughly 2x as slow as Mahout's >> KMeans. >> > > > > >>>>> This is because of centroid.update() for one (it doesn't >> have >> > > > > >>>>> "interesting" properties that are exploited in the current) >> and >> > > > also >> > > > > >>>>> because we're using we're just doing lots of Vector >> operations: >> > > dot >> > > > > >>>>> products for projections, hash codes for unique sets of >> > > candidates, >> > > > > >>>>> equality checks etc. >> > > > > >>>>> >> > > > > >>>>> Here's a snapshot of the hotspots for one run of BallKMeans >> (20 >> > > > > >>>>> newsgroups, unprojected, i.e 20K sparse vectors with 200K >> > > > dimensions >> > > > > and >> > > > > >>>>> ~100 nonzero values): >> > > > > >>>>> Name Time (ms) >> > > org.apache.mahout.math.AbstractVector.dot(Vector) >> > > > > >>>>> 104069 org.apache.mahout.math.DelegatingVector.hashCode() >> > 30159 >> > > > > org.apache.mahout.math.AbstractVector.assignIterateBoth(Iterator, >> > > > > >>>>> Iterator, DoubleDoubleFunction) 13720 >> > > > > >>>>> >> > > > > >> > > > >> > > >> > >> org.apache.mahout.math.OrderedIntDoubleMapping.merge(OrderedIntDoubleMapping) >> > > > > >>>>> 2911 java.lang.ClassLoader.loadClass(String, boolean) 2620 >> > > > > sun.misc.Launcher$AppClassLoader.loadClass(String, >> > > > > >>>>> boolean) 2620 >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> You can see that the most time is spent doing Vector >> > operations. >> > > > > >>>>> Combined, the first three rows account for 91% of the >> runtime. >> > > > > >>>>> >> > > > > >>>>> Thanks for your help! >> > > > > >>>>> >> > > > > >>>>> PS: Is this reaching dev@? >> > > > > >>>>> >> > > > > >>>>> [1] >> > > > > >>>>> >> > > > > >> > > > >> > > >> > >> https://docs.google.com/document/d/1g1PjUuvjyh2LBdq2_rKLIcUiDbeOORA1sCJiSsz-JVU/edit# >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> On Tue, Apr 23, 2013 at 7:53 AM, Robin Anil < >> > > [email protected] >> > > > > >wrote: >> > > > > >>>>> >> > > > > >>>>>> Few more brain dump ideas >> > > > > >>>>>> >> > > > > >>>>>> For any operation (assign, aggregate) you are trying to do. >> > > > > >>>>>> Compute set of "feasible" solutions: Based on all the >> > properties >> > > > of >> > > > > >>>>>> functions >> > > > > >>>>>> Choose the solution that minimises the cost. >> > > > > >>>>>> >> > > > > >>>>>> For this you may need to refactor all the special kinds of >> > > > > iterations >> > > > > >>>>>> and aggregations subclassing from AssignOperation and >> > > > > AggregateOperation. >> > > > > >>>>>> Each operation would return the cost given two input >> vectors >> > > > (based >> > > > > on >> > > > > >>>>>> their cost properties) >> > > > > >>>>>> >> > > > > >>>>>> >> > > > > >>>>>> >> > > > > >>>>>> ------ >> > > > > >>>>>> Robin Anil >> > > > > >>>>>> >> > > > > >>>>>> >> > > > > >>>>>> >> > > > > >>>>>> On Mon, Apr 22, 2013 at 8:46 PM, Robin Anil < >> > > [email protected] >> > > > > >wrote: >> > > > > >>>>>> >> > > > > >>>>>>> Strike that. I think the code is very confusing. You will >> > have >> > > > > >>>>>>> better luck if you create method for the three >> > implementations. >> > > > It >> > > > > will >> > > > > >>>>>>> make things explicit and easy to debug. >> > > > > >>>>>>> >> > > > > >>>>>>> public final double getIterationCost() { >> > > > > >>>>>>> return one of {cardinality, 1.2*numNonDefault, >> > numNonDefault} >> > > > > >>>>>>> } >> > > > > >>>>>>> >> > > > > >>>>>>> public final double getLookupCost() { >> > > > > >>>>>>> return one of {1, 1.5(lets say), log_2(numNonDefault)} >> > > > > >>>>>>> } >> > > > > >>>>>>> >> > > > > >>>>>>> All your logic then becomes a minimization of the total >> cost. >> > > > > >>>>>>> >> > > > > >>>>>>> This is more extendable than current isDense(), >> > isSequential(), >> > > > > >>>>>>> isSparse(), isRandom() and doesn't assume what the >> underlying >> > > > > >>>>>>> implementation is. >> > > > > >>>>>>> >> > > > > >>>>>>> >> > > > > >>>>>>> ------ >> > > > > >>>>>>> Robin Anil >> > > > > >>>>>>> >> > > > > >>>>>>> >> > > > > >>>>>>> >> > > > > >>>>>>> On Mon, Apr 22, 2013 at 8:09 PM, Robin Anil < >> > > > [email protected] >> > > > > >wrote: >> > > > > >>>>>>> >> > > > > >>>>>>>> Also the calculation for cost is wrong. RASV is HashMap >> > based >> > > > its >> > > > > >>>>>>>> cost O(n) not O(log n) >> > > > > >>>>>>>> >> > > > > >>>>>>>> ------ >> > > > > >>>>>>>> Robin Anil >> > > > > >>>>>>>> >> > > > > >>>>>>>> >> > > > > >>>>>>>> >> > > > > >>>>>>>> On Mon, Apr 22, 2013 at 4:43 PM, Robin Anil < >> > > > [email protected] >> > > > > >wrote: >> > > > > >>>>>>>> >> > > > > >>>>>>>>> This is an automatically generated e-mail. To reply, >> > > visit: >> > > > > >>>>>>>>> https://reviews.apache.org/r/10669/ >> > > > > >>>>>>>>> >> > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java< >> > > > > >> > > >> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line56> >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> None >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 55 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> boolean commutativeAndAssociative = >> > > > > aggregator.isCommutative() && aggregator.isAssociative(); >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> This should be a method in AbstractFunction interface >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java< >> > > > > >> > > > >> > > >> > >> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1037 >> > >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 726, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 927, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 942 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> // Make all the non-zero values 0. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> vector.clear() >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java< >> > > > > >> > > > >> > > >> > >> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1047 >> > >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 726, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 927, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 952 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> while (it.hasNext()) { >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> There is no guarantee of ordered iteration. The >> updates >> > is >> > > > > unsorted. merge typically implies that input is sorted >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java< >> > > > > >> > > > >> > > >> > >> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1074 >> > >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 726, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 927, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 979 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> element.set(values[index]); >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> What if there are values that are not in the existing >> > > data. I >> > > > > dont this is correct >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java< >> > > > > >> > > > >> > > >> > >> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1100 >> > >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 726, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': >> > 927, >> > > > > 'expand_offset': 2} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 999 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> invalidateCachedLength(); >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> move invalidateCachedLength to the set method. >> Instead of >> > > > > dispersing it everywhere. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java< >> > > > > >> > > > >> > > >> > >> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1115 >> > >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> None >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': ' private Vector >> > > assignIterateBoth(Iterator<Element> >> > > > > thisIterator, Iterator<Element> thatIterator,', 'line': 1070} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 1014 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> protected Vector >> > > assignSkipZerosIterateOne(Iterator<Element> >> > > > > thisIterator, Vector other, >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> be consistent, this that or thisIterator that >> Iterator. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java< >> > > > > >> > > > >> > > >> > >> https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1171 >> > >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> None >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': ' private Vector >> > > assignIterateBoth(Iterator<Element> >> > > > > thisIterator, Iterator<Element> thatIterator,', 'line': 1070} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 1070 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> private Vector assignIterateBoth(Iterator<Element> >> > > > > thisIterator, Iterator<Element> thatIterator, >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> This expects the data to to be sorted. indicate that >> in a >> > > > > comment. Similarly comment other places >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > math/src/main/java/org/apache/mahout/math/function/Functions.java< >> > > > > >> > > >> https://reviews.apache.org/r/10669/diff/2/?file=283202#file283202line900 >> > > > > >> > > > > (Diff >> > > > > >>>>>>>>> revision 2) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> None >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> {'text': ' public boolean isLikeLeftMult() {', >> 'line': >> > > > 892} >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> 896 >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> /** >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> Move this comments to the base class >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> - Robin >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> On April 22nd, 2013, 8:45 p.m., Dan Filimon wrote: >> > > > > >>>>>>>>> Review request for mahout, Ted Dunning, Sebastian >> > Schelter, >> > > > and >> > > > > >>>>>>>>> Robin Anil. >> > > > > >>>>>>>>> By Dan Filimon. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> *Updated April 22, 2013, 8:45 p.m.* >> > > > > >>>>>>>>> Description >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> This patch contains code cleaning up AbstractVector and >> > > making >> > > > > the operations as fast as possible while still having a high level >> > > > > interface. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> The main changes are in AbstractVector as well as new >> > methods >> > > > in >> > > > > DoubleDoubleFunction. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> Testing >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> The vectors test pass but it's likely that the patch in >> > it's >> > > > > current state is broken as other, unrelated tests (BallKMeans...) >> are >> > > > > failing. >> > > > > >>>>>>>>> Also, my Hadoop conf is broken so I didn't run all the >> core >> > > > > tests. Anyone? >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> I can't seem to find the bug, so _please_ have a closer >> > look. >> > > > > It's still a work in progress. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> The benchmarks seem comparable (although there are some >> > > jarring >> > > > > diferences – Minkowski distance seems a lot slower in new-dan-1 >> than >> > > > > old-trunk-2). It may be however that this is just variance due to >> the >> > > > load >> > > > > of the machine at the time. I'm having trouble interpreting the >> > > > benchmarks >> > > > > in general, so anyone who could give me a hand is more than >> welcome. >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> Diffs >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/AbstractMatrix.java >> > > > > >>>>>>>>> (e12aa38) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/AbstractVector.java >> > > > > >>>>>>>>> (090aa7a) >> > > > > >>>>>>>>> - >> > math/src/main/java/org/apache/mahout/math/Centroid.java >> > > > > >>>>>>>>> (0c42196) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/ConstantVector.java >> > > > > >>>>>>>>> (51d67d4) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/DelegatingVector.java >> > > > > >>>>>>>>> (12220d4) >> > > > > >>>>>>>>> - >> > > math/src/main/java/org/apache/mahout/math/DenseVector.java >> > > > > >>>>>>>>> (41c356b) >> > > > > >>>>>>>>> - >> > > > > >> > > > >> > > >> > >> math/src/main/java/org/apache/mahout/math/FileBasedSparseBinaryMatrix.java >> > > > > >>>>>>>>> (094003b) >> > > > > >>>>>>>>> - >> > > math/src/main/java/org/apache/mahout/math/MatrixSlice.java >> > > > > >>>>>>>>> (7f79c96) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/MatrixVectorView.java >> > > > > >>>>>>>>> (af70727) >> > > > > >>>>>>>>> - >> > > math/src/main/java/org/apache/mahout/math/NamedVector.java >> > > > > >>>>>>>>> (4b7a41d) >> > > > > >>>>>>>>> - >> > > > > >> > math/src/main/java/org/apache/mahout/math/OrderedIntDoubleMapping.java >> > > > > >>>>>>>>> (650d82d) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/PermutedVectorView.java >> > > > > >>>>>>>>> (d1ea93a) >> > > > > >>>>>>>>> - >> > > > > >> > math/src/main/java/org/apache/mahout/math/RandomAccessSparseVector.java >> > > > > >>>>>>>>> (6f85692) >> > > > > >>>>>>>>> - >> > > > > >> > > > >> > > >> > >> math/src/main/java/org/apache/mahout/math/SequentialAccessSparseVector.java >> > > > > >>>>>>>>> (21982f9) >> > > > > >>>>>>>>> - >> math/src/main/java/org/apache/mahout/math/Vector.java >> > > > > >>>>>>>>> (2f8b417) >> > > > > >>>>>>>>> - >> > > math/src/main/java/org/apache/mahout/math/VectorView.java >> > > > > >>>>>>>>> (add2a60) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/WeightedVector.java >> > > > > >>>>>>>>> (06fbd05) >> > > > > >>>>>>>>> - >> > > > > >> > > > >> > > >> > >> math/src/main/java/org/apache/mahout/math/function/DoubleDoubleFunction.java >> > > > > >>>>>>>>> (82b350a) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/function/Functions.java >> > > > > >>>>>>>>> (eb2e42f) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/function/PlusMult.java >> > > > > >>>>>>>>> (60587b1) >> > > > > >>>>>>>>> - >> > > > > >> math/src/main/java/org/apache/mahout/math/function/TimesFunction.java >> > > > > >>>>>>>>> (8ab0bb1) >> > > > > >>>>>>>>> - >> > > > > math/src/main/java/org/apache/mahout/math/jet/math/Constants.java >> > > > > >>>>>>>>> (53535d6) >> > > > > >>>>>>>>> - >> > > > > math/src/test/java/org/apache/mahout/math/AbstractVectorTest.java >> > > > > >>>>>>>>> (2b11199) >> > > > > >>>>>>>>> - >> > > math/src/test/java/org/apache/mahout/math/VectorTest.java >> > > > > >>>>>>>>> (d6d554b) >> > > > > >>>>>>>>> >> > > > > >>>>>>>>> View Diff <https://reviews.apache.org/r/10669/diff/> >> > > > > >>>>>>>>> >> > > > > >>>>>>>> >> > > > > >>>>>>>> >> > > > > >>>>>>> >> > > > > >>>>>> >> > > > > >>>>> >> > > > > >>>> >> > > > > >>> >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
