[ 
https://issues.apache.org/jira/browse/MAHOUT-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789201#action_12789201
 ] 

Jake Mannix commented on MAHOUT-208:
------------------------------------

bq. Alternative to maintaining caching flag is to use the hashcode of 
underlying constructs. For example, in case of SparseVector, we could use 
OpenIntDoubleHashMap.hashCode() to see if the cached value is still valid. In 
case of DenseVectors, hashcode of arrays can be used.

Does this really work?  hashCode() is nearly as expensive as lengthNorm() to 
compute itself, so unless I'm blanking on some fancy thing the JVM does to 
cache hashcodes and invalidate them when data which would make them change... 
then you do a hashCode() check to see if you need to recompute the 
lengthNorm(), taking nearly twice the time in the case where there was 
mutation, and taking O(numNonZeroEntries) time instead of O(1) when there 
wasn't.

> Vector.getLengthSquared() is dangerously optimized
> --------------------------------------------------
>
>                 Key: MAHOUT-208
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-208
>             Project: Mahout
>          Issue Type: Bug
>          Components: Matrix
>    Affects Versions: 0.1
>         Environment: all
>            Reporter: Jake Mannix
>            Assignee: Sean Owen
>             Fix For: 0.3
>
>
> SparseVector and DenseVector both cache the value of lengthSquared, so that 
> subsequent calls to it get the cached value.  Great, except the cache is 
> never cleared - calls to set/setQuick or assign or anything, all leave the 
> cached value unchanged.  
> Mutating method calls should set lengthNorm to -1 so that the cache is 
> cleared.
> This could be a really nasty bug if hit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to