[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Jake Mannix (JIRA) Tue, 17 Nov 2009 22:53:07 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779335#action_12779335
 ]


Jake Mannix commented on MAHOUT-165:
------------------------------------

Awesome, thanks Drew.  I noticed you didn't add a 
{code}<module>colt</module>{code} line inside of the top level pom, is this to 
hide it from being depended on?  I just ask because it meant that IntelliJ 
didn't seem to want to consider the mahout-colt submodule to be a real maven 
submodule without that there.

So what are the next steps here?  If we drop this into Mahout now, we can take 
advantage of it very quickly, which would be great.  Having it be it's own 
maven submodule means that it should be fairly easy to pull it out of Mahout 
entirely later, if that is the desire, right?   Grant, what are your thoughts 
on this?

I would love to see this package up on google-code/sourceforge/github, because 
then other projects could use it as well, without having to depend on Mahout.  
I just don't want to slow down the process of solidifying our linear primitives 
here in Mahout-land.  The sooner we do that, the better.

> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: colt.jar, mahout-165-trove.patch, 
> MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, 
> MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, 
> mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to