[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782077#action_12782077 ]
Jake Mannix commented on MAHOUT-207: ------------------------------------ We definitely should include the optimization that set(i, 0) should work fast, but it's not trivial to do. Hash based impls need to yes, remove the previous entry. Array-based sparse vectors do what, exactly? A vector represented as { indices: int[] { 1, 3, 5 }, values: double[] { 1.1, 2.2, 3.3 } } to start, gets a call to setQuick(3, 0), so that in the current implementation it becomes { indices: int[] { 1, 3, 5 }, values: double[] { 1.1, 0, 3.3 } }. What would you suggest be done to "remove" the entry efficiently? What would be useful, is that all sparse vector impls have a "compact()" method which removes all zeroes and finds a small-space representation. But either way, we should not *require* in the contract for sparse vector that there be no zero values. Regarding the call to equivalent, you're right, that should not be done as the static method - if equivalent was non-static, then it could be overridden to be done smartly by subclasses (ie. if one of the two being compared is a DenseVector or RandomAccessSparseVector, iterate over the other checking the dense one via getQuick(), and if both of them are SequentialAccessSparseVectors, both iterators can be walked in parallel). > AbstractVector.hashCode() should not care about the order of iteration over > elements > ------------------------------------------------------------------------------------ > > Key: MAHOUT-207 > URL: https://issues.apache.org/jira/browse/MAHOUT-207 > Project: Mahout > Issue Type: Bug > Components: Matrix > Affects Versions: 0.2 > Environment: all > Reporter: Jake Mannix > Assignee: Grant Ingersoll > Fix For: 0.3 > > Attachments: MAHOUT-207.patch > > > As was discussed in MAHOUT-165, hashCode can be implemented simply like this: > {code} > public int hashCode() { > final int prime = 31; > int result = prime + ((name == null) ? 0 : name.hashCode()); > result = prime * result + size(); > Iterator<Element> iter = iterateNonZero(); > while (iter.hasNext()) { > Element ele = iter.next(); > long v = Double.doubleToLongBits(ele.get()); > result += (ele.index() * (int)(v^(v>>32))); > } > return result; > } > {code} > which obviates the need to sort the elements in the case of a random access > hash-based implementation. Also, (ele.index() * (int)(v^(v>>32)) ) == 0 when > v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse > vectors which have zero elements returned from the iterateNonZero() iterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.