This looks very wrong. The iterators for SASV extend guava's AbstractIterator, but they do reuse the NonDefaultElement instance internally. It *looks* like we're correctly satisfying the AbstractIterator#computeNext() contract, but we must not be if we're mutating on multiple hasNext() calls...
On Fri, Apr 12, 2013 at 9:36 AM, Dan Filimon <[email protected]>wrote: > While looking at the patch for fixing the sparse vectors (MAHOUT-1190), I > started working with vector Iterators doing what I thought was reasonable. > > This is the important snippet: > [...] > thisIterator = this.iterateNonZero(); > thatIterator = other.iterateNonZero(); > thisElement = thatElement = null; > boolean advanceThis = true; > boolean advanceThat = true; > OrderedIntDoubleMapping thisUpdates = new > OrderedIntDoubleMapping(); > > while (thisIterator.hasNext() && thatIterator.hasNext()) { > if (advanceThis) { > thisElement = thisIterator.next(); > } > if (advanceThat) { > thatElement = thatIterator.next(); > } > [... advanceThis and advanceThat are set to true based on which iterator to > advance...] > > The problem here is that when calling next(), the iterator state gets > invalidated and when calling hasNext() the iterator will be advanced > accordingly and the element references will point to the next element > (which is mutated). > > So, if the indices start at: > 52 and 87 > despite wanting to only advance the 52, since both were accessed with > next(), they are both modified. > > Here's another snippet with this behavior [1]: > > Vector vector = new SequentialAccessSparseVector(100); > vector.set(0, 1); > vector.set(2, 2); > vector.set(4, 3); > vector.set(6, 4); > Iterator<Vector.Element> vectorIterator = vector.iterateNonZero(); > Vector.Element element = null; > int i = 0; > while (vectorIterator.hasNext()) { > if (i % 2 == 0) { > element = vectorIterator.next(); > } > System.out.printf("%d %d %f\n", i, element.index(), element.get()); > ++i; > } > > > The output is: > 0 0 1.000000 > 1 2 2.000000 > 2 2 2.000000 > 3 4 3.000000 > 4 4 3.000000 > 5 6 4.000000 > 6 6 4.000000 > > I expected it to be: > 0 0 1.000000 > 1 0 1.000000 > 2 2 2.000000 > 3 2 2.000000 > 4 4 3.000000 > 5 4 3.000000 > 6 6 4.000000 > > So, I'm completely wrong. Is this just me not understanding what an > iterator is supposed to do? > > [1] https://gist.github.com/dfilimon/5373271 > -- -jake
