I will pull your Patch in and take a look a close look at it tonight. Whats different in Run3 in Version7 v/s Run2 and Run1. Because Run3 looks really good and the regressions of 30% in some are actually not as bad and nothing no longer looks ridiculously blocking the patch.
On Mon, Apr 29, 2013 at 10:13 AM, Dan Filimon <[email protected]>wrote: > I uploaded the latest patch to my vector branch on github [0]. > > The latest revision is, as always, here [1], "Version 7 (Assign/Aggregate; > 1 sec runtime)". Tests pass for this one except for some aggregate tests > (it's not clear if aggregate should throw an exception when the vectors are > empty or return 0; I chose return 0). > > There's quite a bit of variance, a lot of it even in code that I haven't > changed between runs. So, norm2 shows regressions in version 7 and > improvements in version 6 and the code not having been changed. > > The dot product is also oscillating in version 7, showing different results > in the different runs (of the same algorithm). > > Except for some cleanup, I don't think I can do much better than this, so > could someone please also run the benchmarks and I'll update the patch on > ReviewBoard? > > [0] https://github.com/dfilimon/mahout/tree/vector > [1] > > https://docs.google.com/spreadsheet/ccc?key=0AochdzPoBmWodG9RTms1UG40YlNQd3ByUFpQY0FLWmc#gid=7 > > > On Sun, Apr 28, 2013 at 11:18 PM, Ted Dunning <[email protected]> > wrote: > > > It would *really* help to put a legend on these spreadsheets even if you > > have described it in the past. > > > > I take it that these are number of ops and bigger is better, right? > > > > And where is a description of what patch runs 1 and 2 actually are? > > > > > > > > On Sun, Apr 28, 2013 at 2:23 AM, Dan Filimon < > [email protected]>wrote: > > > >> Here's [1] the most recent benchmarks, after refactoring the Assign into > >> separate classes. > >> It's now using a 10 second runtime and a 1 second lead time and these > are > >> only the benchmarks involving assign(). > >> > >> Oddly enough there are regressions with RandomAccessSparseVectors' > plus() > >> and minus() despite these working like trunk from what I can tell from > the > >> tests (VectorBinaryAssignTest). > >> > >> Also, there's quite a bit of variance for some reason... Clone, Create > >> are just two of the odd ones. > >> > >> [1] > >> > https://docs.google.com/spreadsheet/ccc?key=0AochdzPoBmWodG9RTms1UG40YlNQd3ByUFpQY0FLWmc#gid=4 > >> > >> > >> On Fri, Apr 26, 2013 at 1:30 AM, Robin Anil <[email protected]> > wrote: > >> > >>> Do in place for rand an dense. Assign a bug to me to speed up rasv > >>> GetElement. > >>> On Apr 25, 2013 2:56 PM, "Dan Filimon" <[email protected]> > >>> wrote: > >>> > >>> > Nearly done splitting the code up, but I'm not sure what the costs > >>> should > >>> > ideally be. > >>> > > >>> > Robin, you proposed: cost of iteration + cost of lookup + cost of > >>> update > >>> > (if its not in-place) > >>> > > >>> > This sounds like it's per element, rather than for the entire vector. > >>> Also, > >>> > are we just going to assume that there will be as many updates as > >>> non-zeros > >>> > or as large as the array is? How does the cost of an update factor > in? > >>> > > >>> > And, is "in-place" just for DenseVectors? > >>> > > >>> > > >>> > On Thu, Apr 25, 2013 at 8:04 PM, Robin Anil <[email protected]> > >>> wrote: > >>> > > >>> > > Depends on the speed of copying a 1M double array v/s doing a > calloc > >>> + > >>> > > copying 1000 non zeros (Assuming java is doing that underneath). > >>> > > > >>> > > ------ > >>> > > Robin Anil > >>> > > > >>> > > > >>> > > > >>> > > On Thu, Apr 25, 2013 at 2:18 AM, Dan Filimon < > >>> > [email protected] > >>> > > >wrote: > >>> > > > >>> > > > Right, but is clone() generally slower than assigning? That > >>> strikes me > >>> > as > >>> > > > odd; doesn't Java optimize copying the internal structures (there > >>> are > >>> > > > arrays underneath after all)? > >>> > > > > >>> > > > > >>> > > > On Thu, Apr 25, 2013 at 10:14 AM, Robin Anil < > [email protected] > >>> > > >>> > > wrote: > >>> > > > > >>> > > >> Seems like for dense clone is slower than like().assign I need > to > >>> test > >>> > > it > >>> > > >> with different sizes to be sure. I kept it from the old > behavior. > >>> > > >> On Apr 25, 2013 2:12 AM, "Dan Filimon" < > >>> [email protected]> > >>> > > >> wrote: > >>> > > >> > >>> > > >>> Okay, so I should split it further into smaller sub-cases that > >>> handle > >>> > > >>> each Vector type. I tried making so that these match the cases > >>> in the > >>> > > >>> document to the extent possible. > >>> > > >>> You're right. It is ugly and I need to split it up. > >>> > > >>> > >>> > > >>> I removed the OrderedIntDoubleMapping (but with another if...) > >>> but > >>> > one > >>> > > >>> more thing that helped is making the assign() only go through > the > >>> > > non-zeros > >>> > > >>> when setting up a new copy. > >>> > > >>> > >>> > > >>> About the copying, I saw that the optimized copy doesn't always > >>> call > >>> > > >>> clone clone on the Vector. Why is this? > >>> > > >>> > >>> > > >>> Thanks for the test and the feedback! :) > >>> > > >>> > >>> > > >>> > >>> > > >>> On Thu, Apr 25, 2013 at 8:30 AM, Robin Anil < > >>> [email protected] > >>> > > >wrote: > >>> > > >>> > >>> > > >>>> Yes its reaching dev. I checked out your code to take a look. > I > >>> > > >>>> understand you are trying to still mix isSequentialAccess etc > >>> in the > >>> > > >>>> validity calculation. The whole point of this architecture is > to > >>> > > completely > >>> > > >>>> unroll things to be more modular instead of creating a giant > >>> > spaghetti > >>> > > >>>> > >>> > > >>>> Here is an example of what I am trying to say. > >>> > > >>>> > >>> > > >>>> This is your current implementation of > >>> AssignIterateOneLookupOther. > >>> > > >>>> There are so many conditionals > >>> > > >>>> > >>> > > >>>> > >>> > > >>>> 1. @Override > >>> > > >>>> 2. public Vector assign(Vector x, Vector y, > >>> > > >>>> DoubleDoubleFunction f) { > >>> > > >>>> 3. return !swap ? assignInner(x, y, f) : > >>> assignInner(y, x, > >>> > > f); > >>> > > >>>> 4. } > >>> > > >>>> 5. > >>> > > >>>> 6. public Vector assignInner(Vector x, Vector y, > >>> > > >>>> DoubleDoubleFunction f) { > >>> > > >>>> 7. Iterator<Vector.Element> xi = x.iterateNonZero(); > >>> > > >>>> 8. Vector.Element xe; > >>> > > >>>> 9. Vector.Element ye; > >>> > > >>>> 10. OrderedIntDoubleMapping updates = > >>> > > newOrderedIntDoubleMapping(); > >>> > > >>>> 11. while (xi.hasNext()) { > >>> > > >>>> 12. xe = xi.next(); > >>> > > >>>> 13. ye = y.getElement(xe.index()); > >>> > > >>>> 14. if (!swap) { > >>> > > >>>> 15. xe.set(f.apply(xe.get(), ye.get())); > >>> > > >>>> 16. } else { > >>> > > >>>> 17. if (ye.get() != 0.0 || > y.isAddConstantTime()) { > >>> > > >>>> 18. ye.set(f.apply(ye.get(), xe.get())); > >>> > > >>>> 19. } else { > >>> > > >>>> 20. updates.set(xe.index(), f.apply(ye.get(), > >>> > > >>>> xe.get())); > >>> > > >>>> 21. } > >>> > > >>>> 22. } > >>> > > >>>> 23. } > >>> > > >>>> 24. if (swap && !y.isAddConstantTime()) { > >>> > > >>>> 25. y.mergeUpdates(updates); > >>> > > >>>> 26. } > >>> > > >>>> 27. return swap ? y : x; > >>> > > >>>> 28. } > >>> > > >>>> 29. } > >>> > > >>>> > >>> > > >>>> Split this into. > >>> > > >>>> > >>> > > >>>> IterateThisLookupThatInplaceUpdate > >>> > > >>>> IterateThatLookupThisInplaceUpdate > >>> > > >>>> IterateThisLookupThatMergeUpdate > >>> > > >>>> IterateThatLookupThisMergeUpdate > >>> > > >>>> > >>> > > >>>> Removing the conditionals itself will speedup the operations > >>> you see > >>> > > >>>> regression on. > >>> > > >>>> > >>> > > >>>> Also note that getElement() is not constant Add time for RASV. > >>> In > >>> > > >>>> earlier version we tried at best to set in place. That reduced > >>> the > >>> > > extra > >>> > > >>>> hashkey computation. getElement in AbstractVector is really > >>> > optimized > >>> > > for > >>> > > >>>> DenseVector and it does indicate as the cause of regression > you > >>> are > >>> > > seeing > >>> > > >>>> in RASV. Another one is the unused new > >>> OrderedIntDoubleMapping(). > >>> > > Removing > >>> > > >>>> all these will bring back the loss. > >>> > > >>>> > >>> > > >>>> Now the cost for each can be computed as: cost of iteration + > >>> cost > >>> > of > >>> > > >>>> lookup + cost of update (if its not in-place) > >>> > > >>>> > >>> > > >>>> See the attached file for a mock test. You will have to rework > >>> the > >>> > > >>>> expectations but the framework should be understandable. > >>> > > >>>> > >>> > > >>>> > >>> > > >>>> ------ > >>> > > >>>> Robin Anil > >>> > > >>>> > >>> > > >>>> > >>> > > >>>> > >>> > > >>>> On Tue, Apr 23, 2013 at 4:09 AM, Dan Filimon < > >>> > > >>>> [email protected]> wrote: > >>> > > >>>> > >>> > > >>>>> Here's [1] a link to the "design document" of the new vector > >>> > > >>>>> operations so I can lay out the ideas behind what I'm doing > >>> more > >>> > > clearly. > >>> > > >>>>> > >>> > > >>>>> I'd like anyone who can to have a look and comment. > >>> > > >>>>> > >>> > > >>>>> Ted, > >>> > > >>>>> I know you'd like me to get back to working on clustering, > but > >>> > > >>>>> currently, BallKMeans is roughly 2x as slow as Mahout's > KMeans. > >>> > > >>>>> This is because of centroid.update() for one (it doesn't have > >>> > > >>>>> "interesting" properties that are exploited in the current) > and > >>> > also > >>> > > >>>>> because we're using we're just doing lots of Vector > >>> operations: dot > >>> > > >>>>> products for projections, hash codes for unique sets of > >>> candidates, > >>> > > >>>>> equality checks etc. > >>> > > >>>>> > >>> > > >>>>> Here's a snapshot of the hotspots for one run of BallKMeans > (20 > >>> > > >>>>> newsgroups, unprojected, i.e 20K sparse vectors with 200K > >>> > dimensions > >>> > > and > >>> > > >>>>> ~100 nonzero values): > >>> > > >>>>> Name Time (ms) > >>> org.apache.mahout.math.AbstractVector.dot(Vector) > >>> > > >>>>> 104069 org.apache.mahout.math.DelegatingVector.hashCode() > >>> 30159 > >>> > > org.apache.mahout.math.AbstractVector.assignIterateBoth(Iterator, > >>> > > >>>>> Iterator, DoubleDoubleFunction) 13720 > >>> > > >>>>> > >>> > > > >>> > > >>> > org.apache.mahout.math.OrderedIntDoubleMapping.merge(OrderedIntDoubleMapping) > >>> > > >>>>> 2911 java.lang.ClassLoader.loadClass(String, boolean) 2620 > >>> > > sun.misc.Launcher$AppClassLoader.loadClass(String, > >>> > > >>>>> boolean) 2620 > >>> > > >>>>> > >>> > > >>>>> > >>> > > >>>>> You can see that the most time is spent doing Vector > >>> operations. > >>> > > >>>>> Combined, the first three rows account for 91% of the > runtime. > >>> > > >>>>> > >>> > > >>>>> Thanks for your help! > >>> > > >>>>> > >>> > > >>>>> PS: Is this reaching dev@? > >>> > > >>>>> > >>> > > >>>>> [1] > >>> > > >>>>> > >>> > > > >>> > > >>> > https://docs.google.com/document/d/1g1PjUuvjyh2LBdq2_rKLIcUiDbeOORA1sCJiSsz-JVU/edit# > >>> > > >>>>> > >>> > > >>>>> > >>> > > >>>>> On Tue, Apr 23, 2013 at 7:53 AM, Robin Anil < > >>> [email protected] > >>> > > >wrote: > >>> > > >>>>> > >>> > > >>>>>> Few more brain dump ideas > >>> > > >>>>>> > >>> > > >>>>>> For any operation (assign, aggregate) you are trying to do. > >>> > > >>>>>> Compute set of "feasible" solutions: Based on all the > >>> properties > >>> > of > >>> > > >>>>>> functions > >>> > > >>>>>> Choose the solution that minimises the cost. > >>> > > >>>>>> > >>> > > >>>>>> For this you may need to refactor all the special kinds of > >>> > > iterations > >>> > > >>>>>> and aggregations subclassing from AssignOperation and > >>> > > AggregateOperation. > >>> > > >>>>>> Each operation would return the cost given two input vectors > >>> > (based > >>> > > on > >>> > > >>>>>> their cost properties) > >>> > > >>>>>> > >>> > > >>>>>> > >>> > > >>>>>> > >>> > > >>>>>> ------ > >>> > > >>>>>> Robin Anil > >>> > > >>>>>> > >>> > > >>>>>> > >>> > > >>>>>> > >>> > > >>>>>> On Mon, Apr 22, 2013 at 8:46 PM, Robin Anil < > >>> [email protected] > >>> > > >wrote: > >>> > > >>>>>> > >>> > > >>>>>>> Strike that. I think the code is very confusing. You will > >>> have > >>> > > >>>>>>> better luck if you create method for the three > >>> implementations. > >>> > It > >>> > > will > >>> > > >>>>>>> make things explicit and easy to debug. > >>> > > >>>>>>> > >>> > > >>>>>>> public final double getIterationCost() { > >>> > > >>>>>>> return one of {cardinality, 1.2*numNonDefault, > >>> numNonDefault} > >>> > > >>>>>>> } > >>> > > >>>>>>> > >>> > > >>>>>>> public final double getLookupCost() { > >>> > > >>>>>>> return one of {1, 1.5(lets say), log_2(numNonDefault)} > >>> > > >>>>>>> } > >>> > > >>>>>>> > >>> > > >>>>>>> All your logic then becomes a minimization of the total > cost. > >>> > > >>>>>>> > >>> > > >>>>>>> This is more extendable than current isDense(), > >>> isSequential(), > >>> > > >>>>>>> isSparse(), isRandom() and doesn't assume what the > underlying > >>> > > >>>>>>> implementation is. > >>> > > >>>>>>> > >>> > > >>>>>>> > >>> > > >>>>>>> ------ > >>> > > >>>>>>> Robin Anil > >>> > > >>>>>>> > >>> > > >>>>>>> > >>> > > >>>>>>> > >>> > > >>>>>>> On Mon, Apr 22, 2013 at 8:09 PM, Robin Anil < > >>> > [email protected] > >>> > > >wrote: > >>> > > >>>>>>> > >>> > > >>>>>>>> Also the calculation for cost is wrong. RASV is HashMap > >>> based > >>> > its > >>> > > >>>>>>>> cost O(n) not O(log n) > >>> > > >>>>>>>> > >>> > > >>>>>>>> ------ > >>> > > >>>>>>>> Robin Anil > >>> > > >>>>>>>> > >>> > > >>>>>>>> > >>> > > >>>>>>>> > >>> > > >>>>>>>> On Mon, Apr 22, 2013 at 4:43 PM, Robin Anil < > >>> > [email protected] > >>> > > >wrote: > >>> > > >>>>>>>> > >>> > > >>>>>>>>> This is an automatically generated e-mail. To reply, > >>> visit: > >>> > > >>>>>>>>> https://reviews.apache.org/r/10669/ > >>> > > >>>>>>>>> > >>> > math/src/main/java/org/apache/mahout/math/AbstractVector.java< > >>> > > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line56> > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> None > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 55 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> boolean commutativeAndAssociative = > >>> > > aggregator.isCommutative() && aggregator.isAssociative(); > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> This should be a method in AbstractFunction interface > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > math/src/main/java/org/apache/mahout/math/AbstractVector.java< > >>> > > > >>> > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1037 > >>> > > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 726, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 927, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 942 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> // Make all the non-zero values 0. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> vector.clear() > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > math/src/main/java/org/apache/mahout/math/AbstractVector.java< > >>> > > > >>> > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1047 > >>> > > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 726, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 927, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 952 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> while (it.hasNext()) { > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> There is no guarantee of ordered iteration. The updates > >>> is > >>> > > unsorted. merge typically implies that input is sorted > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > math/src/main/java/org/apache/mahout/math/AbstractVector.java< > >>> > > > >>> > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1074 > >>> > > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 726, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 927, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 979 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> element.set(values[index]); > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> What if there are values that are not in the existing > >>> data. I > >>> > > dont this is correct > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > math/src/main/java/org/apache/mahout/math/AbstractVector.java< > >>> > > > >>> > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1100 > >>> > > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 726, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': 'public int getNumNonZeroElements() {', 'line': > >>> 927, > >>> > > 'expand_offset': 2} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 999 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> invalidateCachedLength(); > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> move invalidateCachedLength to the set method. Instead > of > >>> > > dispersing it everywhere. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > math/src/main/java/org/apache/mahout/math/AbstractVector.java< > >>> > > > >>> > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1115 > >>> > > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> None > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': ' private Vector > >>> assignIterateBoth(Iterator<Element> > >>> > > thisIterator, Iterator<Element> thatIterator,', 'line': 1070} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 1014 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> protected Vector > >>> assignSkipZerosIterateOne(Iterator<Element> > >>> > > thisIterator, Vector other, > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> be consistent, this that or thisIterator that Iterator. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > math/src/main/java/org/apache/mahout/math/AbstractVector.java< > >>> > > > >>> > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283185#file283185line1171 > >>> > > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> None > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': ' private Vector > >>> assignIterateBoth(Iterator<Element> > >>> > > thisIterator, Iterator<Element> thatIterator,', 'line': 1070} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 1070 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> private Vector assignIterateBoth(Iterator<Element> > >>> > > thisIterator, Iterator<Element> thatIterator, > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> This expects the data to to be sorted. indicate that > in a > >>> > > comment. Similarly comment other places > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > math/src/main/java/org/apache/mahout/math/function/Functions.java< > >>> > > > >>> > https://reviews.apache.org/r/10669/diff/2/?file=283202#file283202line900 > >>> > > > >>> > > (Diff > >>> > > >>>>>>>>> revision 2) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> None > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> {'text': ' public boolean isLikeLeftMult() {', > 'line': > >>> > 892} > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> 896 > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> /** > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> Move this comments to the base class > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> - Robin > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> On April 22nd, 2013, 8:45 p.m., Dan Filimon wrote: > >>> > > >>>>>>>>> Review request for mahout, Ted Dunning, Sebastian > >>> Schelter, > >>> > and > >>> > > >>>>>>>>> Robin Anil. > >>> > > >>>>>>>>> By Dan Filimon. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> *Updated April 22, 2013, 8:45 p.m.* > >>> > > >>>>>>>>> Description > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> This patch contains code cleaning up AbstractVector and > >>> making > >>> > > the operations as fast as possible while still having a high level > >>> > > interface. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> The main changes are in AbstractVector as well as new > >>> methods > >>> > in > >>> > > DoubleDoubleFunction. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> Testing > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> The vectors test pass but it's likely that the patch in > >>> it's > >>> > > current state is broken as other, unrelated tests (BallKMeans...) > are > >>> > > failing. > >>> > > >>>>>>>>> Also, my Hadoop conf is broken so I didn't run all the > core > >>> > > tests. Anyone? > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> I can't seem to find the bug, so _please_ have a closer > >>> look. > >>> > > It's still a work in progress. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> The benchmarks seem comparable (although there are some > >>> jarring > >>> > > diferences – Minkowski distance seems a lot slower in new-dan-1 > than > >>> > > old-trunk-2). It may be however that this is just variance due to > the > >>> > load > >>> > > of the machine at the time. I'm having trouble interpreting the > >>> > benchmarks > >>> > > in general, so anyone who could give me a hand is more than > welcome. > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> Diffs > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/AbstractMatrix.java > >>> > > >>>>>>>>> (e12aa38) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/AbstractVector.java > >>> > > >>>>>>>>> (090aa7a) > >>> > > >>>>>>>>> - > >>> math/src/main/java/org/apache/mahout/math/Centroid.java > >>> > > >>>>>>>>> (0c42196) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/ConstantVector.java > >>> > > >>>>>>>>> (51d67d4) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/DelegatingVector.java > >>> > > >>>>>>>>> (12220d4) > >>> > > >>>>>>>>> - > >>> math/src/main/java/org/apache/mahout/math/DenseVector.java > >>> > > >>>>>>>>> (41c356b) > >>> > > >>>>>>>>> - > >>> > > > >>> > > >>> > math/src/main/java/org/apache/mahout/math/FileBasedSparseBinaryMatrix.java > >>> > > >>>>>>>>> (094003b) > >>> > > >>>>>>>>> - > >>> math/src/main/java/org/apache/mahout/math/MatrixSlice.java > >>> > > >>>>>>>>> (7f79c96) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/MatrixVectorView.java > >>> > > >>>>>>>>> (af70727) > >>> > > >>>>>>>>> - > >>> math/src/main/java/org/apache/mahout/math/NamedVector.java > >>> > > >>>>>>>>> (4b7a41d) > >>> > > >>>>>>>>> - > >>> > > > >>> math/src/main/java/org/apache/mahout/math/OrderedIntDoubleMapping.java > >>> > > >>>>>>>>> (650d82d) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/PermutedVectorView.java > >>> > > >>>>>>>>> (d1ea93a) > >>> > > >>>>>>>>> - > >>> > > > >>> math/src/main/java/org/apache/mahout/math/RandomAccessSparseVector.java > >>> > > >>>>>>>>> (6f85692) > >>> > > >>>>>>>>> - > >>> > > > >>> > > >>> > math/src/main/java/org/apache/mahout/math/SequentialAccessSparseVector.java > >>> > > >>>>>>>>> (21982f9) > >>> > > >>>>>>>>> - > math/src/main/java/org/apache/mahout/math/Vector.java > >>> > > >>>>>>>>> (2f8b417) > >>> > > >>>>>>>>> - > >>> math/src/main/java/org/apache/mahout/math/VectorView.java > >>> > > >>>>>>>>> (add2a60) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/WeightedVector.java > >>> > > >>>>>>>>> (06fbd05) > >>> > > >>>>>>>>> - > >>> > > > >>> > > >>> > math/src/main/java/org/apache/mahout/math/function/DoubleDoubleFunction.java > >>> > > >>>>>>>>> (82b350a) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/function/Functions.java > >>> > > >>>>>>>>> (eb2e42f) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/function/PlusMult.java > >>> > > >>>>>>>>> (60587b1) > >>> > > >>>>>>>>> - > >>> > > > math/src/main/java/org/apache/mahout/math/function/TimesFunction.java > >>> > > >>>>>>>>> (8ab0bb1) > >>> > > >>>>>>>>> - > >>> > > math/src/main/java/org/apache/mahout/math/jet/math/Constants.java > >>> > > >>>>>>>>> (53535d6) > >>> > > >>>>>>>>> - > >>> > > math/src/test/java/org/apache/mahout/math/AbstractVectorTest.java > >>> > > >>>>>>>>> (2b11199) > >>> > > >>>>>>>>> - > >>> math/src/test/java/org/apache/mahout/math/VectorTest.java > >>> > > >>>>>>>>> (d6d554b) > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> View Diff <https://reviews.apache.org/r/10669/diff/> > >>> > > >>>>>>>>> > >>> > > >>>>>>>> > >>> > > >>>>>>>> > >>> > > >>>>>>> > >>> > > >>>>>> > >>> > > >>>>> > >>> > > >>>> > >>> > > >>> > >>> > > > > >>> > > > >>> > > >>> > >> > >> > > >
