[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760898#action_12760898
 ] 

Grant Ingersoll commented on MAHOUT-165:
----------------------------------------

There are some thoughts on equals, etc. in the archives and other JIRA issues. 

Here's what I recall:

1. We want DenseVectors and SparseVectors with the same names to be equal in 
the equals() sense.  The implementations of equals in SparseVector and 
DenseVector are the equivalent, AFAICT, as the implementations in equals(), but 
for #2:
2. We don't just defer to strictEquivalence b/c the thinking is that we can do 
much faster equals comparison if we know what type of vector it is, which is 
why SparseVector checks to see if "that" is a SparseVector, otherwise deferring 
to equivalent (since names have already been checked).  I haven't validated 
whether they truly are faster.


> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: colt.jar, mahout-165-trove.patch, MAHOUT-165.patch, 
> mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to