It should be pretty easy to check via a new unit test if this iteration /
changing
values interleaved operation works.  It's hard to tell
if indexOfInsertion() is
implemented completely safely by inspection.


On Mon, Apr 15, 2013 at 10:50 AM, Robin Anil <[email protected]> wrote:

> On second thought both should work. The first method should not mutate if
> the element already exists. Now I am scared, this sounds to me like a bug
> in the OpenIntDoubleHashMap implementation.
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Mon, Apr 15, 2013 at 12:32 PM, Robin Anil <[email protected]> wrote:
>
> > Found the bug PearsonCorrelationSimilarity was trying to mutate the
> > object while iterating.
> >
> >
> >    1.     while (it.hasNext()) {
> >    2.       Vector.Element e = it.next();
> >    3.       *vector.set(e.index(),* e.get() - average);
> >    4.     }
> >
> > This has a side effect of causing the underlying hash-map or object to
> > change.
> >
> > The right behavior is to set the value of the index while iterating.
> >
> >    1.     while (it.hasNext()) {
> >    2.       Vector.Element e = it.next();
> >    3.       *e.set(e.get()* - average);
> >    4.     }
> >
> > I am sure we are incorrectly doing the first style across the code at
> many
> > places.
> >
> > I am proposing this
> >
> > When iterating, we lock the set interface on the vector using a State
> > enum. If anyone tries to mutate, we throw an exception.
> > We flip the state when we complete iterating (hasNext = false) or when we
> > explicitly close the iterator (adding a close method on the iterator).
> >
> > Again this is all a single thread fix. if a vector is being mutated and
> > iterated across multiple threads, all hell can break loose.
> >
> > Robin
> >
> >
> >
> > On Mon, Apr 15, 2013 at 12:56 AM, Robin Anil <[email protected]>
> wrote:
> >
> >> Spoke too soon still failure.  I am uploading the latest patch. These
> are
> >> the current failing tests.
> >>
> >>
>  
> ClusterClassificationDriverTest.testVectorClassificationWithOutlierRemovalMR:103->assertVectorsWithOutlierRemoval:189->checkClustersWithOutlierRemoval:239->Assert.assertTrue:41->Assert.fail:88
> >> not expecting cluster:{0:1.0,1:1.0}
> >>
> >>
> ClusterClassificationDriverTest.testVectorClassificationWithOutlierRemoval:139->assertVectorsWithOutlierRemoval:189->checkClustersWithOutlierRemoval:239->Assert.assertTrue:41->Assert.fail:88
> >> not expecting cluster:{0:1.0,1:1.0}
> >>
> >>
> ClusterClassificationDriverTest.testVectorClassificationWithoutOutlierRemoval:121->assertVectorsWithoutOutlierRemoval:193->assertFirstClusterWithoutOutlierRemoval:218->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86
> >> null
> >>
> >>
> ClusterOutputPostProcessorTest.testTopDownClustering:102->assertPostProcessedOutput:188->assertTopLevelCluster:115->assertPointsInSecondTopLevelCluster:134->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86
> >> null
> >>
> >>
> VectorSimilarityMeasuresTest.testPearsonCorrelationSimilarity:109->Assert.assertEquals:592->Assert.assertEquals:494->Assert.failNotEquals:743->Assert.fail:88
> >> expected:<0.5303300858899108> but was:<0.38729833462074176>
> >>
> >>
> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>
> >>
> >> On Mon, Apr 15, 2013 at 12:24 AM, Robin Anil <[email protected]
> >wrote:
> >>
> >>> Found it, fixed it. I am submitting soon.
> >>>
> >>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>>
> >>>
> >>> On Sun, Apr 14, 2013 at 11:56 PM, Ted Dunning <[email protected]
> >wrote:
> >>>
> >>>> Robin,
> >>>>
> >>>> Can you make sure that the patches are somewhere that Dan can pick up
> >>>> this
> >>>> work?  He is in GMT+2 and is probably about to appear on the scene.
> >>>>
> >>>>
> >>>>
> >>>> On Sun, Apr 14, 2013 at 9:34 PM, Robin Anil <[email protected]>
> >>>> wrote:
> >>>>
> >>>> > Strike that there are still failures. Investigating. if I cant fix
> it
> >>>> in
> >>>> > the next hour, I will submit them sometime in the evening tomorrow.
> >>>> >
> >>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>>> >
> >>>> >
> >>>> > On Sun, Apr 14, 2013 at 11:33 PM, Robin Anil <[email protected]>
> >>>> wrote:
> >>>> >
> >>>> > > Tests pass. Submitting the patches.
> >>>> > >
> >>>> > > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>>> > >
> >>>> > >
> >>>> > > On Sun, Apr 14, 2013 at 11:17 PM, Robin Anil <
> [email protected]>
> >>>> > wrote:
> >>>> > >
> >>>> > >> Added a few more tests. Throw NoSuchElementException like Java
> >>>> > >> Collections when iterating past the end. Things look solid,
> >>>> performance
> >>>> > is
> >>>> > >> 2x. All Math tests pass. I am now waiting for the entire test
> >>>> suites to
> >>>> > run
> >>>> > >> before submitting.
> >>>> > >>
> >>>> > >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>>> > >>
> >>>> > >>
> >>>> > >> On Sun, Apr 14, 2013 at 9:49 PM, Robin Anil <
> [email protected]>
> >>>> > wrote:
> >>>> > >>
> >>>> > >>> I am not sure what I did. But removing Guava Abstract iterator
> >>>> actually
> >>>> > >>> sped up the dot, cosine, euclidean by another 60%. Things are
> now
> >>>> 2x
> >>>> > faster
> >>>> > >>> than trunk. While also correcting the behavior (I hope)
> >>>> > >>>
> >>>> > >>>
> >>>> > >>>
> >>>> >
> >>>>
> https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=1
> >>>> > >>>
> >>>> > >>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>>> > >>>
> >>>> > >>>
> >>>> > >>> On Sun, Apr 14, 2013 at 8:56 PM, Robin Anil <
> [email protected]
> >>>> > >wrote:
> >>>> > >>>
> >>>> > >>>> Also note that this is code gen, I have to create
> >>>> > Element$keyType$Value
> >>>> > >>>> for each and every combination not just int double. and also
> >>>> update
> >>>> > all
> >>>> > >>>> callers to user ElementIntDouble instead of Element. Is it
> worth
> >>>> it ?
> >>>> > >>>>
> >>>> > >>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
> >>>> > >>>>
> >>>> > >>>>
> >>>> > >>>> On Sun, Apr 14, 2013 at 8:46 PM, Ted Dunning <
> >>>> [email protected]
> >>>> > >wrote:
> >>>> > >>>>
> >>>> > >>>>> Collections (no longer colt collections) are now part of
> mahout
> >>>> math.
> >>>> > >>>>>  No
> >>>> > >>>>> need to keep them separate.  The lower iterator can reference
> >>>> > >>>>> Vector.Element
> >>>> > >>>>>
> >>>> > >>>>>
> >>>> > >>>>> On Sun, Apr 14, 2013 at 6:24 PM, Robin Anil <
> >>>> [email protected]>
> >>>> > >>>>> wrote:
> >>>> > >>>>>
> >>>> > >>>>> > I would have loved to but Element is a sub interface in
> >>>> Vector. If
> >>>> > >>>>> we want
> >>>> > >>>>> > to keep colt collections separate we have to keep this
> >>>> separation.
> >>>> > >>>>> >
> >>>> > >>>>>
> >>>> > >>>>
> >>>> > >>>>
> >>>> > >>>
> >>>> > >>
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>



-- 

  -jake

Reply via email to