Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Grant Ingersoll Sat, 29 Aug 2009 14:29:36 -0700

Right, Colt likely could be used depending on the package it comesfrom and as long as it doesn't have deps on the other packages.


-Grant


On Aug 29, 2009, at 2:22 PM, Ted Dunning wrote:

Trove is LGPL so we can't lift code.  Even linking can be tricky.
On Fri, Aug 28, 2009 at 10:06 AM, Shashikant Kore (JIRA) <j...@apache.org>wrote:
  [
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748904#action_12748904]
Shashikant Kore commented on MAHOUT-165:
----------------------------------------

I'm fine with copying relevant classes from Colt or Trove.
Please let me know your library of choice. I will create the patchand
upload.
Using better primitives hash for sparse vector for performance gains
--------------------------------------------------------------------

               Key: MAHOUT-165
               URL: https://issues.apache.org/jira/browse/MAHOUT-165
           Project: Mahout
        Issue Type: Improvement
        Components: Matrix
  Affects Versions: 0.2
          Reporter: Shashikant Kore
           Fix For: 0.2

       Attachments: mahout-165.patch
In SparseVector, we need primitives hash map for index and values.The
present implementation of this hash map is not as efficient as someof the
other implementations in non-Apache projects.
In an experiment, I found that, for get/set operations, theprimitive
hash of  Colt performance an order of magnitude better than
OrderedIntDoubleMapping. For iteration it is 2x slower, though.
Using Colt in Sparsevector improved performance of canopygeneration. For
an experimental dataset, the current implementation takes 50minutes. UsingColt, reduces this duration to 19-20 minutes. That's 60% reductionin the
delay.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
--
Ted Dunning, CTO
DeepDyve


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to