Shai can you try the patch on LUCENE-2568? Thanks. Mike
On Mon, Jul 26, 2010 at 4:25 PM, Michael McCandless <luc...@mikemccandless.com> wrote: > OK I think likely this is a bug in RAS. And we are just seeing the > difference in how Oracle's & IBM's JREs handle an unpaired > surrogate... > > Lemme work out a patch... > > Mike > > On Mon, Jul 26, 2010 at 4:13 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> Yeah that char is a high surrogate which is unpaired, which is no good >> -- it's invalid. Cool, though, that Google puts us first when you >> search on this character :) >> >> Can you figure out how that bad string was created? That "if >> (random.nextBoolean())" either creates the string randomly (which >> should never return unpaired surrogate), or, calls >> RandomAcceptedString.getRandomAcceptedString... maybe the bug is in >> RAS. >> >> Mike >> >> On Mon, Jul 26, 2010 at 3:41 PM, Shai Erera <ser...@gmail.com> wrote: >>> From here: http://www.fileformat.info/info/unicode/char/d9ff/index.htm >>> >>> Looks like that character is not a valid Unicode character, and perhaps the >>> IBM's JVM behaves correctly? Robert - you're the Unicode expert :). >>> >>> Shai >>> >>> On Mon, Jul 26, 2010 at 10:40 PM, Shai Erera <ser...@gmail.com> wrote: >>>> >>>> I don't know what was the thing w/ the strings generated before, but now I >>>> ran the test again w/ the same seed and it generates the same strings. So >>>> at >>>> least it seems there are no problems w/ the Random class :). >>>> >>>> However, the string l.E fails w/ the IBM JVM and succeeds w/ SUN's. Any >>>> ideas why? What does the test check anyway? >>>> >>>> I ran TRR2, and set the regexp to always be "l.E" and the test passes. The >>>> failure comes from >>>> >>>> junit.framework.AssertionFailedError: expected:<true> but was:<false> >>>> at >>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:199) >>>> at >>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:171) >>>> >>>> I've set regexp to "l.E", and also 'string' inside assertAutomaton to >>>> "\u006C\uD9FF\u0045". The byte[] returned from string.getBytes("UTF-8") are >>>> [108, 69]. It just ignores the middle character. Perhaps that's why the >>>> test >>>> fails? >>>> >>>> When I run this w/ SUN's JVM, the bytes returned are [108, 63, 69]. >>>> >>>> If I manually set the bytes, using IBM's, to [108, 63, 69], then the test >>>> passes. >>>> >>>> Interestingly, Googling for \uD9FF brings back LUCENE-2019 as the first >>>> result :). I'll dig some more into this character, and why the IBM and SUN >>>> JVMs return different byte[] representation for the same sequence of >>>> characters. If you already spot the problem, please let me know. >>>> >>>> BTW, the test calls _TestUtil.getRandomMultiplier on every iteration loop, >>>> which goes and checks a system property. Perhaps we can extract it to a >>>> variable, or include a static constant in LuceneTestCase(J4) or something? >>>> >>>> Shai >>>> >>>> On Mon, Jul 26, 2010 at 9:22 PM, Robert Muir <rcm...@gmail.com> wrote: >>>>> >>>>> maybe there is a bug in ibm's random generator :) >>>>> >>>>> On Mon, Jul 26, 2010 at 11:50 AM, Michael McCandless >>>>> <luc...@mikemccandless.com> wrote: >>>>>> >>>>>> That's VERY spooky that w/ a fixed seed you see different random >>>>>> regexps being made. >>>>>> >>>>>> Mike >>>>>> >>>>>> On Mon, Jul 26, 2010 at 11:40 AM, Shai Erera <ser...@gmail.com> wrote: >>>>>> > Ok I've dug deeper into the test. I set the random seed to >>>>>> > -9029631602016965389L in setUp(), and discovered that on the 4th >>>>>> > iteration >>>>>> > it breaks. For some reason though, AutomatonTestUtil.randomRegex >>>>>> > generates >>>>>> > different strings every time I run the test, even though it uses the >>>>>> > same >>>>>> > Random object w/ the same seed ... >>>>>> > >>>>>> > Anyway, one of the regex that failed was this "l.E" (w/o the quotes) >>>>>> > and I >>>>>> > think it's a lowercase L, '.' (dot) and 'E' (uppercase). Hope this >>>>>> > helps. >>>>>> > >>>>>> > Shai >>>>>> > >>>>>> > On Mon, Jul 26, 2010 at 6:23 PM, Robert Muir <rcm...@gmail.com> wrote: >>>>>> >> >>>>>> >> sounds nasty... its good you are running the tests with this >>>>>> >> different >>>>>> >> jvm... >>>>>> >> >>>>>> >> On Mon, Jul 26, 2010 at 11:21 AM, Shai Erera <ser...@gmail.com> >>>>>> >> wrote: >>>>>> >>> >>>>>> >>> Tried to run it w/ SUN JRE6 and it succeeds ! I've tried several >>>>>> >>> times >>>>>> >>> and it succeeds every time. However, when I revert back to IBM's, it >>>>>> >>> fail >>>>>> >>> immediately. >>>>>> >>> >>>>>> >>> I can help w/ the debug, if you give me a hint where to look :). >>>>>> >>> >>>>>> >>> Shai >>>>>> >>> >>>>>> >>> On Mon, Jul 26, 2010 at 5:57 PM, Shai Erera <ser...@gmail.com> >>>>>> >>> wrote: >>>>>> >>>> >>>>>> >>>> Sorry for the delayed response. >>>>>> >>>> >>>>>> >>>> I ran it a couple more times, from Eclipse and Ant, and each time >>>>>> >>>> it >>>>>> >>>> fails (amazing !), w/ different seeds. More seeds that fail: >>>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was: >>>>>> >>>> -4244174191361080127 >>>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was: >>>>>> >>>> -7059086272401721644 >>>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was: >>>>>> >>>> -1314734215611104147 >>>>>> >>>> >>>>>> >>>> I use IBM JVM, tried w/ both 1.5 and 1.6 ... >>>>>> >>>> >>>>>> >>>> Mike, can we use LUCENE-2565 to track this, or would you prefer >>>>>> >>>> that I >>>>>> >>>> open a separate one? >>>>>> >>>> >>>>>> >>>> Shai >>>>>> >>>> >>>>>> >>>> On Mon, Jul 26, 2010 at 3:26 PM, Michael McCandless >>>>>> >>>> <luc...@mikemccandless.com> wrote: >>>>>> >>>>> >>>>>> >>>>> On a more general note... >>>>>> >>>>> >>>>>> >>>>> Any time any of you out there hit an "odd" test failure, please >>>>>> >>>>> please >>>>>> >>>>> please do just what Shai did: take it to the dev list! >>>>>> >>>>> >>>>>> >>>>> Think of Lucene's unit tests like SETI :) We are desperately >>>>>> >>>>> seeking >>>>>> >>>>> bugs, and you and your machine may just be lucky enough to find >>>>>> >>>>> one... >>>>>> >>>>> go forth and buy expensive new power hungry computers just so you >>>>>> >>>>> can >>>>>> >>>>> run the random tests over and over, seeking the bugs! >>>>>> >>>>> >>>>>> >>>>> But be sure to include that random seed when you do hit a >>>>>> >>>>> failure... >>>>>> >>>>> >>>>>> >>>>> Mike >>>>>> >>>>> >>>>>> >>>>> On Mon, Jul 26, 2010 at 8:23 AM, Robert Muir <rcm...@gmail.com> >>>>>> >>>>> wrote: >>>>>> >>>>> > I agree, Shai can you open a bug? I cannot reproduce, did you >>>>>> >>>>> > use an >>>>>> >>>>> > IBM JVM >>>>>> >>>>> > or another environment that might help us figure it out? >>>>>> >>>>> > >>>>>> >>>>> > On Mon, Jul 26, 2010 at 6:29 AM, Michael McCandless >>>>>> >>>>> > <luc...@mikemccandless.com> wrote: >>>>>> >>>>> >> >>>>>> >>>>> >> Hmmm this means a bug is lurking. This is the power of random >>>>>> >>>>> >> testing >>>>>> >>>>> >> (that every time we all run tests, we're testing different >>>>>> >>>>> >> "paths" >>>>>> >>>>> >> through the code).... >>>>>> >>>>> >> >>>>>> >>>>> >> It seems exceptionally unlikely that LUCENE-2537's changes >>>>>> >>>>> >> would >>>>>> >>>>> >> cause >>>>>> >>>>> >> this! >>>>>> >>>>> >> >>>>>> >>>>> >> But, unfortunately, when I plug that seed in I don't see it >>>>>> >>>>> >> fail, >>>>>> >>>>> >> which is odd. I'll run a stress test to see if I can tickle >>>>>> >>>>> >> the >>>>>> >>>>> >> bug... can you open a Jira issue so we don't lose track? >>>>>> >>>>> >> >>>>>> >>>>> >> Mike >>>>>> >>>>> >> >>>>>> >>>>> >> On Mon, Jul 26, 2010 at 2:57 AM, Shai Erera <ser...@gmail.com> >>>>>> >>>>> >> wrote: >>>>>> >>>>> >> > Hi >>>>>> >>>>> >> > >>>>>> >>>>> >> > I was running tests on trunk (after merging the changes from >>>>>> >>>>> >> > LUCENE-2537) >>>>>> >>>>> >> > and received this error message: >>>>>> >>>>> >> > >>>>>> >>>>> >> > expected:<true> but was:<false> >>>>>> >>>>> >> > >>>>>> >>>>> >> > junit.framework.AssertionFailedError: expected: but was: >>>>>> >>>>> >> > at >>>>>> >>>>> >> > >>>>>> >>>>> >> > >>>>>> >>>>> >> > >>>>>> >>>>> >> > org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:197) >>>>>> >>>>> >> > at >>>>>> >>>>> >> > >>>>>> >>>>> >> > >>>>>> >>>>> >> > >>>>>> >>>>> >> > org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:170) >>>>>> >>>>> >> > at >>>>>> >>>>> >> > >>>>>> >>>>> >> > >>>>>> >>>>> >> > org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:285) >>>>>> >>>>> >> > >>>>>> >>>>> >> > NOTE: random seed of testcase 'testRandomRegexes' was: >>>>>> >>>>> >> > 3510820306304573866 >>>>>> >>>>> >> > >>>>>> >>>>> >> > I'm sure it's related to my changes. Has anyone else seen >>>>>> >>>>> >> > this >>>>>> >>>>> >> > before? >>>>>> >>>>> >> > >>>>>> >>>>> >> > Shai >>>>>> >>>>> >> > >>>>>> >>>>> >> >>>>>> >>>>> >> >>>>>> >>>>> >> >>>>>> >>>>> >> --------------------------------------------------------------------- >>>>>> >>>>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> >>>>> >> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>> >> >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > >>>>>> >>>>> > -- >>>>>> >>>>> > Robert Muir >>>>>> >>>>> > rcm...@gmail.com >>>>>> >>>>> > >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> --------------------------------------------------------------------- >>>>>> >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>> >>>>>> >>>> >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Robert Muir >>>>>> >> rcm...@gmail.com >>>>>> > >>>>>> > >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Muir >>>>> rcm...@gmail.com >>>> >>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org