[ https://issues.apache.org/jira/browse/MAHOUT-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789201#action_12789201 ]
Jake Mannix commented on MAHOUT-208: ------------------------------------ bq. Alternative to maintaining caching flag is to use the hashcode of underlying constructs. For example, in case of SparseVector, we could use OpenIntDoubleHashMap.hashCode() to see if the cached value is still valid. In case of DenseVectors, hashcode of arrays can be used. Does this really work? hashCode() is nearly as expensive as lengthNorm() to compute itself, so unless I'm blanking on some fancy thing the JVM does to cache hashcodes and invalidate them when data which would make them change... then you do a hashCode() check to see if you need to recompute the lengthNorm(), taking nearly twice the time in the case where there was mutation, and taking O(numNonZeroEntries) time instead of O(1) when there wasn't. > Vector.getLengthSquared() is dangerously optimized > -------------------------------------------------- > > Key: MAHOUT-208 > URL: https://issues.apache.org/jira/browse/MAHOUT-208 > Project: Mahout > Issue Type: Bug > Components: Matrix > Affects Versions: 0.1 > Environment: all > Reporter: Jake Mannix > Assignee: Sean Owen > Fix For: 0.3 > > > SparseVector and DenseVector both cache the value of lengthSquared, so that > subsequent calls to it get the cached value. Great, except the cache is > never cleared - calls to set/setQuick or assign or anything, all leave the > cached value unchanged. > Mutating method calls should set lengthNorm to -1 so that the cache is > cleared. > This could be a really nasty bug if hit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.