[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779121#action_12779121
 ] 

Jake Mannix commented on MAHOUT-165:
------------------------------------

bq. Perhaps he would be willing to donate Colt to Apache? I don't think we can 
just bring in it's source and claim it as ours. Another option is we see if he 
would move it over to Google Code and make some of us committers on the 
project. Perhaps Commons Math is interested in it, too.

I asked him, and he said that we can go ahead and have it, he's not maintaining 
it anymore.  It's Apache-licensed, so can't we take it, regardless of whether 
he was contactable or not, as long as we attribute him and abide by the license 
he put on the code:

    Packages cern.colt* , cern.jet*, cern.clhep

    Copyright (c) 1999 CERN - European Organization for Nuclear Research.

    Permission to use, copy, modify, distribute and sell this software and its 
documentation for any purpose is hereby granted without fee, provided that the 
above copyright notice appear in all copies and that both that copyright notice 
and this permission notice appear in supporting documentation. CERN makes no 
representations about the suitability of this software for any purpose. It is 
provided "as is" without expressed or implied warranty.

But either way, he did say we could have it.

Commons math was interested at some point, as well as being interested in MTJ, 
but they seem to have abandoned the idea of incorporating anyone else's 
primitives anytime soon, as they are not willing to break their backwards 
compatibility reqs until 3.0 (and since 2.0 just came out a few months ago, 
we're talking a looooong time).

Putting it on google-code is an interesting option, it would resurrect the 
project, from a perspective of people outside of Mahout... I kinda like that 
idea.  Would it slow down our ability to use it, to do this?

> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: colt.jar, mahout-165-trove.patch, 
> MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to