I'm +1 on this. -------- Original message -------- From: Suneel Marthi <smar...@apache.org> Date: 03/07/2016 8:09 PM (GMT-05:00) To: mahout <dev@mahout.apache.org> Subject: Re: [jira] [Commented] (MAHOUT-1640) Better collections would significantly improve vector-operation speed
If @apalumbo, @pferrel et.al vote for it now, we should merge the patch into 0.11.2 master and 0.12.0 branch. No need to wait for 3 days. Again, +1 from me. Thanks @vigna and sorry about missing this, my focus has been on 0.12.0 Flink integration. On Mon, Mar 7, 2016 at 8:06 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > ok standard 3 days then. > > On Mon, Mar 7, 2016 at 5:04 PM, ASF GitHub Bot (JIRA) <j...@apache.org> > wrote: > > > > > [ > > > https://issues.apache.org/jira/browse/MAHOUT-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184122#comment-15184122 > > ] > > > > ASF GitHub Bot commented on MAHOUT-1640: > > ---------------------------------------- > > > > Github user smarthi commented on the pull request: > > > > https://github.com/apache/mahout/pull/81#issuecomment-193536262 > > > > Seems like it's ASL 2.0 - > > https://github.com/vigna/fastutil/blob/master/LICENSE-2.0 > > > > +1 from me, good to go. > > > > On Mon, Mar 7, 2016 at 7:21 PM, Dmitriy Lyubimov < > > notificati...@github.com> > > wrote: > > > > > @vigna <https://github.com/vigna> is 0.7.2 fastutil is still the > > best > > > version to use? I can't immediately find the license on it? > > > @smarthi <https://github.com/smarthi> et. al. : need a few votes > on > > > inclusion of fastutil as a dependency > > > > > > — > > > Reply to this email directly or view it on GitHub > > > <https://github.com/apache/mahout/pull/81#issuecomment-193522992>. > > > > > > > > > > > > Better collections would significantly improve vector-operation speed > > > --------------------------------------------------------------------- > > > > > > Key: MAHOUT-1640 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-1640 > > > Project: Mahout > > > Issue Type: Improvement > > > Components: collections > > > Environment: Darwin lithium.local 14.1.0 Darwin Kernel Version > > 14.1.0: Mon Dec 22 23:10:38 PST 2014; > root:xnu-2782.10.72~2/RELEASE_X86_64 > > x86_64 i386 MacBookPro10,1 Darwin > > > java version "1.8.0_31" > > > Java(TM) SE Runtime Environment (build 1.8.0_31-b13) > > > Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) > > > Reporter: Sebastiano Vigna > > > Assignee: Suneel Marthi > > > Labels: legacy, math, scala > > > Attachments: fastutil.patch, speed-fastutil, speed-std > > > > > > > > > The collections currently used by Mahout to implement sparse vectors > are > > extremely slow. The proposed patch (localized to > RandomAccessSparseVector) > > uses fastutil's maps and the speed improvements in vector benchmarks are > > very significant. It would be interesting to see whether these > improvements > > percolate to high-level classes using sparse vectors. > > > I had to patch two unit tests (an off-by-one bug and an overfitting > bug; > > both were exposed by the different order in which key/values were > returned > > by iterators). > > > The included files speed-std and speed-fastutil show the speed > > improvement. Some more speed might be gained by using everywhere the > > standard java.util.Map.Entry interface instead of Element. > > > DISCLAIMER: The "Times" set of tests has been run multiplying two > > identical vectors. The standard tests multiply two random vectors, so in > > fact they just test the speed of the underlying map remove() method, as > > almost all products are zero. This is not very realistic and was heavily > > penalizing fastutil's "true deletions". Better tests, with a typical > > overlap of nonzero entries, would be even more realistic. > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > >