[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Ted Dunning (JIRA) Wed, 30 Sep 2009 15:07:52 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761021#action_12761021
 ]


Ted Dunning commented on MAHOUT-165:
------------------------------------


THanks Jake, that could be very helpful.

The throwing of "Impossible confusion" is done in situations where an 
impossible condition has been detected.  For instance, since hash tables are 
resized when they become partially filled, it should be impossible for the 
search loop to exit without finding an empty cell or a match.  When 
programming, I have difficulty pronouncing "should" so I try to detect the 
situation and signal it with an unchecked exception.  I usually define 
something like "ImpossibleConditionException", but didn't in this case.  I use 
an unchecked exception because it is clear that the application is not going to 
be much able to recover from a situation that I don't think could occur.

I left the hard-coding of one option or the other in place because I could see 
my patch extending into everything everywhere and wanted to limit the scope of 
the change.  You are right that we need to think about how that works.  In most 
cases, I think that hard-coding is fine just like hard-coding the use of an 
ArrayList in some application is not subject to user over-ride.  There are a 
few cases where this isn't try, but I think that usually that means that the 
vector or matrix should be passed in.  The use of like() may also be indicated.

> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: colt.jar, mahout-165-trove.patch, 
> MAHOUT-165-updated.patch, MAHOUT-165.patch, mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to