[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779121#action_12779121 ]
Jake Mannix commented on MAHOUT-165: ------------------------------------ bq. Perhaps he would be willing to donate Colt to Apache? I don't think we can just bring in it's source and claim it as ours. Another option is we see if he would move it over to Google Code and make some of us committers on the project. Perhaps Commons Math is interested in it, too. I asked him, and he said that we can go ahead and have it, he's not maintaining it anymore. It's Apache-licensed, so can't we take it, regardless of whether he was contactable or not, as long as we attribute him and abide by the license he put on the code: Packages cern.colt* , cern.jet*, cern.clhep Copyright (c) 1999 CERN - European Organization for Nuclear Research. Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. CERN makes no representations about the suitability of this software for any purpose. It is provided "as is" without expressed or implied warranty. But either way, he did say we could have it. Commons math was interested at some point, as well as being interested in MTJ, but they seem to have abandoned the idea of incorporating anyone else's primitives anytime soon, as they are not willing to break their backwards compatibility reqs until 3.0 (and since 2.0 just came out a few months ago, we're talking a looooong time). Putting it on google-code is an interesting option, it would resurrect the project, from a perspective of people outside of Mahout... I kinda like that idea. Would it slow down our ability to use it, to do this? > Using better primitives hash for sparse vector for performance gains > -------------------------------------------------------------------- > > Key: MAHOUT-165 > URL: https://issues.apache.org/jira/browse/MAHOUT-165 > Project: Mahout > Issue Type: Improvement > Components: Matrix > Affects Versions: 0.2 > Reporter: Shashikant Kore > Assignee: Grant Ingersoll > Fix For: 0.3 > > Attachments: colt.jar, mahout-165-trove.patch, > MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch > > > In SparseVector, we need primitives hash map for index and values. The > present implementation of this hash map is not as efficient as some of the > other implementations in non-Apache projects. > In an experiment, I found that, for get/set operations, the primitive hash of > Colt performance an order of magnitude better than OrderedIntDoubleMapping. > For iteration it is 2x slower, though. > Using Colt in Sparsevector improved performance of canopy generation. For an > experimental dataset, the current implementation takes 50 minutes. Using > Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the > delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.