[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Sean Owen (JIRA) Fri, 04 Sep 2009 02:28:31 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751385#action_12751385
 ]


Sean Owen commented on MAHOUT-165:
----------------------------------

Wait a sec, I thought we had concluded that we *cannot* use Trove. It was Colt 
that had a portion which was licensed acceptably.

Are you saying these errors occur before you change? I don't see these failures 
in head.

The first error -- can't tell you why it happens but can explain it more, if 
that's what you're asking. Zero and negative zero are actually different 
doubles, and they aren't ==. Somehow the computation has changed in your patch 
such that a result ends up zero, but negative zero actually. One might say the 
test should actually not compare doubles for exact equality, but for equality 
to the last decimal place or something. But I don't see how this change should 
have affected this result, period, so probably should be viewed as a problem 
with the patch or Trove or some funky interaction.

Sounds like Gson can't serialize/deserialize the trove class correctly because 
of some circular reference among the instances. Dunno why that would be a 
problem.

But I think all this is moot since we can't use Trove?

> Using better primitives hash for sparse vector for performance gains
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-165
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-165
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>    Affects Versions: 0.2
>            Reporter: Shashikant Kore
>             Fix For: 0.2
>
>         Attachments: mahout-165-trove.patch, mahout-165.patch
>
>
> In SparseVector, we need primitives hash map for index and values. The 
> present implementation of this hash map is not as efficient as some of the 
> other implementations in non-Apache projects. 
> In an experiment, I found that, for get/set operations, the primitive hash of 
>  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
> For iteration it is 2x slower, though. 
> Using Colt in Sparsevector improved performance of canopy generation. For an 
> experimental dataset, the current implementation takes 50 minutes. Using 
> Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
> delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Reply via email to