[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779335#action_12779335 ]
Jake Mannix commented on MAHOUT-165: ------------------------------------ Awesome, thanks Drew. I noticed you didn't add a {code}<module>colt</module>{code} line inside of the top level pom, is this to hide it from being depended on? I just ask because it meant that IntelliJ didn't seem to want to consider the mahout-colt submodule to be a real maven submodule without that there. So what are the next steps here? If we drop this into Mahout now, we can take advantage of it very quickly, which would be great. Having it be it's own maven submodule means that it should be fairly easy to pull it out of Mahout entirely later, if that is the desire, right? Grant, what are your thoughts on this? I would love to see this package up on google-code/sourceforge/github, because then other projects could use it as well, without having to depend on Mahout. I just don't want to slow down the process of solidifying our linear primitives here in Mahout-land. The sooner we do that, the better. > Using better primitives hash for sparse vector for performance gains > -------------------------------------------------------------------- > > Key: MAHOUT-165 > URL: https://issues.apache.org/jira/browse/MAHOUT-165 > Project: Mahout > Issue Type: Improvement > Components: Matrix > Affects Versions: 0.2 > Reporter: Shashikant Kore > Assignee: Grant Ingersoll > Fix For: 0.3 > > Attachments: colt.jar, mahout-165-trove.patch, > MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, > MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, > mahout-165.patch > > > In SparseVector, we need primitives hash map for index and values. The > present implementation of this hash map is not as efficient as some of the > other implementations in non-Apache projects. > In an experiment, I found that, for get/set operations, the primitive hash of > Colt performance an order of magnitude better than OrderedIntDoubleMapping. > For iteration it is 2x slower, though. > Using Colt in Sparsevector improved performance of canopy generation. For an > experimental dataset, the current implementation takes 50 minutes. Using > Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the > delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.