[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751423#action_12751423 ]
Sean Owen commented on MAHOUT-165: ---------------------------------- While I'm not a lawyer, I am all but certain there is no distinction between distributing a .jar and distributing a class -- in fact distributing source typically carries more restrictions. So, I am pretty sure we can't use Trove if its license is not compatible, in any form. Colt appears to license its code in two parts, and the part we need is licensed compatibly. To be completely safe with them, we'd need to copy only the part that is suitably licensed. If that means repacking the .jar or copying source or whatever, is up to us. Others -- how's my interpretation? > Using better primitives hash for sparse vector for performance gains > -------------------------------------------------------------------- > > Key: MAHOUT-165 > URL: https://issues.apache.org/jira/browse/MAHOUT-165 > Project: Mahout > Issue Type: Improvement > Components: Matrix > Affects Versions: 0.2 > Reporter: Shashikant Kore > Fix For: 0.2 > > Attachments: mahout-165-trove.patch, mahout-165.patch > > > In SparseVector, we need primitives hash map for index and values. The > present implementation of this hash map is not as efficient as some of the > other implementations in non-Apache projects. > In an experiment, I found that, for get/set operations, the primitive hash of > Colt performance an order of magnitude better than OrderedIntDoubleMapping. > For iteration it is 2x slower, though. > Using Colt in Sparsevector improved performance of canopy generation. For an > experimental dataset, the current implementation takes 50 minutes. Using > Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the > delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.