[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-165:
-----------------------------------

    Attachment: MAHOUT-165-colt.patch

The Colt stuff looks good, my only concern, legally, is the name, oddly enough. 
 I don't think we should call it Colt.  AFAICT, that name is owned by CERN and 
while the license allows us to bring over the code, it doesn't give us rights 
to the name.

This patch changes the name to matrix, adds the appropriate legal bits to 
NOTICE.txt and LICENSE.txt

This just covers the Colt stuff, it does not apply Shashi's patch.  

It seems like we should just move our Matrix (currently in core) out to this 
package and have core have a dependency on this module.

> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: colt.jar, mahout-165-18nov-updated.patch, 
> mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, 
> MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, 
> MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, 
> mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to