It should be pretty easy to check via a new unit test if this iteration / changing values interleaved operation works. It's hard to tell if indexOfInsertion() is implemented completely safely by inspection.
On Mon, Apr 15, 2013 at 10:50 AM, Robin Anil <[email protected]> wrote: > On second thought both should work. The first method should not mutate if > the element already exists. Now I am scared, this sounds to me like a bug > in the OpenIntDoubleHashMap implementation. > > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > > > On Mon, Apr 15, 2013 at 12:32 PM, Robin Anil <[email protected]> wrote: > > > Found the bug PearsonCorrelationSimilarity was trying to mutate the > > object while iterating. > > > > > > 1. while (it.hasNext()) { > > 2. Vector.Element e = it.next(); > > 3. *vector.set(e.index(),* e.get() - average); > > 4. } > > > > This has a side effect of causing the underlying hash-map or object to > > change. > > > > The right behavior is to set the value of the index while iterating. > > > > 1. while (it.hasNext()) { > > 2. Vector.Element e = it.next(); > > 3. *e.set(e.get()* - average); > > 4. } > > > > I am sure we are incorrectly doing the first style across the code at > many > > places. > > > > I am proposing this > > > > When iterating, we lock the set interface on the vector using a State > > enum. If anyone tries to mutate, we throw an exception. > > We flip the state when we complete iterating (hasNext = false) or when we > > explicitly close the iterator (adding a close method on the iterator). > > > > Again this is all a single thread fix. if a vector is being mutated and > > iterated across multiple threads, all hell can break loose. > > > > Robin > > > > > > > > On Mon, Apr 15, 2013 at 12:56 AM, Robin Anil <[email protected]> > wrote: > > > >> Spoke too soon still failure. I am uploading the latest patch. These > are > >> the current failing tests. > >> > >> > > ClusterClassificationDriverTest.testVectorClassificationWithOutlierRemovalMR:103->assertVectorsWithOutlierRemoval:189->checkClustersWithOutlierRemoval:239->Assert.assertTrue:41->Assert.fail:88 > >> not expecting cluster:{0:1.0,1:1.0} > >> > >> > ClusterClassificationDriverTest.testVectorClassificationWithOutlierRemoval:139->assertVectorsWithOutlierRemoval:189->checkClustersWithOutlierRemoval:239->Assert.assertTrue:41->Assert.fail:88 > >> not expecting cluster:{0:1.0,1:1.0} > >> > >> > ClusterClassificationDriverTest.testVectorClassificationWithoutOutlierRemoval:121->assertVectorsWithoutOutlierRemoval:193->assertFirstClusterWithoutOutlierRemoval:218->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 > >> null > >> > >> > ClusterOutputPostProcessorTest.testTopDownClustering:102->assertPostProcessedOutput:188->assertTopLevelCluster:115->assertPointsInSecondTopLevelCluster:134->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 > >> null > >> > >> > VectorSimilarityMeasuresTest.testPearsonCorrelationSimilarity:109->Assert.assertEquals:592->Assert.assertEquals:494->Assert.failNotEquals:743->Assert.fail:88 > >> expected:<0.5303300858899108> but was:<0.38729833462074176> > >> > >> > >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >> > >> > >> On Mon, Apr 15, 2013 at 12:24 AM, Robin Anil <[email protected] > >wrote: > >> > >>> Found it, fixed it. I am submitting soon. > >>> > >>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>> > >>> > >>> On Sun, Apr 14, 2013 at 11:56 PM, Ted Dunning <[email protected] > >wrote: > >>> > >>>> Robin, > >>>> > >>>> Can you make sure that the patches are somewhere that Dan can pick up > >>>> this > >>>> work? He is in GMT+2 and is probably about to appear on the scene. > >>>> > >>>> > >>>> > >>>> On Sun, Apr 14, 2013 at 9:34 PM, Robin Anil <[email protected]> > >>>> wrote: > >>>> > >>>> > Strike that there are still failures. Investigating. if I cant fix > it > >>>> in > >>>> > the next hour, I will submit them sometime in the evening tomorrow. > >>>> > > >>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>>> > > >>>> > > >>>> > On Sun, Apr 14, 2013 at 11:33 PM, Robin Anil <[email protected]> > >>>> wrote: > >>>> > > >>>> > > Tests pass. Submitting the patches. > >>>> > > > >>>> > > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>>> > > > >>>> > > > >>>> > > On Sun, Apr 14, 2013 at 11:17 PM, Robin Anil < > [email protected]> > >>>> > wrote: > >>>> > > > >>>> > >> Added a few more tests. Throw NoSuchElementException like Java > >>>> > >> Collections when iterating past the end. Things look solid, > >>>> performance > >>>> > is > >>>> > >> 2x. All Math tests pass. I am now waiting for the entire test > >>>> suites to > >>>> > run > >>>> > >> before submitting. > >>>> > >> > >>>> > >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>>> > >> > >>>> > >> > >>>> > >> On Sun, Apr 14, 2013 at 9:49 PM, Robin Anil < > [email protected]> > >>>> > wrote: > >>>> > >> > >>>> > >>> I am not sure what I did. But removing Guava Abstract iterator > >>>> actually > >>>> > >>> sped up the dot, cosine, euclidean by another 60%. Things are > now > >>>> 2x > >>>> > faster > >>>> > >>> than trunk. While also correcting the behavior (I hope) > >>>> > >>> > >>>> > >>> > >>>> > >>> > >>>> > > >>>> > https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=1 > >>>> > >>> > >>>> > >>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>>> > >>> > >>>> > >>> > >>>> > >>> On Sun, Apr 14, 2013 at 8:56 PM, Robin Anil < > [email protected] > >>>> > >wrote: > >>>> > >>> > >>>> > >>>> Also note that this is code gen, I have to create > >>>> > Element$keyType$Value > >>>> > >>>> for each and every combination not just int double. and also > >>>> update > >>>> > all > >>>> > >>>> callers to user ElementIntDouble instead of Element. Is it > worth > >>>> it ? > >>>> > >>>> > >>>> > >>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Sun, Apr 14, 2013 at 8:46 PM, Ted Dunning < > >>>> [email protected] > >>>> > >wrote: > >>>> > >>>> > >>>> > >>>>> Collections (no longer colt collections) are now part of > mahout > >>>> math. > >>>> > >>>>> No > >>>> > >>>>> need to keep them separate. The lower iterator can reference > >>>> > >>>>> Vector.Element > >>>> > >>>>> > >>>> > >>>>> > >>>> > >>>>> On Sun, Apr 14, 2013 at 6:24 PM, Robin Anil < > >>>> [email protected]> > >>>> > >>>>> wrote: > >>>> > >>>>> > >>>> > >>>>> > I would have loved to but Element is a sub interface in > >>>> Vector. If > >>>> > >>>>> we want > >>>> > >>>>> > to keep colt collections separate we have to keep this > >>>> separation. > >>>> > >>>>> > > >>>> > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >>>> > >> > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > > > -- -jake
