Shai can you try the patch on LUCENE-2568?  Thanks.

Mike

On Mon, Jul 26, 2010 at 4:25 PM, Michael McCandless
<luc...@mikemccandless.com> wrote:
> OK I think likely this is a bug in RAS.  And we are just seeing the
> difference in how Oracle's & IBM's JREs handle an unpaired
> surrogate...
>
> Lemme work out a patch...
>
> Mike
>
> On Mon, Jul 26, 2010 at 4:13 PM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>> Yeah that char is a high surrogate which is unpaired, which is no good
>> -- it's invalid.  Cool, though, that Google puts us first when you
>> search on this character :)
>>
>> Can you figure out how that bad string was created?  That "if
>> (random.nextBoolean())" either creates the string randomly (which
>> should never return unpaired surrogate), or, calls
>> RandomAcceptedString.getRandomAcceptedString... maybe the bug is in
>> RAS.
>>
>> Mike
>>
>> On Mon, Jul 26, 2010 at 3:41 PM, Shai Erera <ser...@gmail.com> wrote:
>>> From here: http://www.fileformat.info/info/unicode/char/d9ff/index.htm
>>>
>>> Looks like that character is not a valid Unicode character, and perhaps the
>>> IBM's JVM behaves correctly? Robert - you're the Unicode expert :).
>>>
>>> Shai
>>>
>>> On Mon, Jul 26, 2010 at 10:40 PM, Shai Erera <ser...@gmail.com> wrote:
>>>>
>>>> I don't know what was the thing w/ the strings generated before, but now I
>>>> ran the test again w/ the same seed and it generates the same strings. So 
>>>> at
>>>> least it seems there are no problems w/ the Random class :).
>>>>
>>>> However, the string l.E fails w/ the IBM JVM and succeeds w/ SUN's. Any
>>>> ideas why? What does the test check anyway?
>>>>
>>>> I ran TRR2, and set the regexp to always be "l.E" and the test passes. The
>>>> failure comes from
>>>>
>>>> junit.framework.AssertionFailedError: expected:<true> but was:<false>
>>>>     at
>>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:199)
>>>>     at
>>>> org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:171)
>>>>
>>>> I've set regexp to "l.E", and also 'string' inside assertAutomaton to
>>>> "\u006C\uD9FF\u0045". The byte[] returned from string.getBytes("UTF-8") are
>>>> [108, 69]. It just ignores the middle character. Perhaps that's why the 
>>>> test
>>>> fails?
>>>>
>>>> When I run this w/ SUN's JVM, the bytes returned are [108, 63, 69].
>>>>
>>>> If I manually set the bytes, using IBM's, to [108, 63, 69], then the test
>>>> passes.
>>>>
>>>> Interestingly, Googling for \uD9FF brings back LUCENE-2019 as the first
>>>> result :). I'll dig some more into this character, and why the IBM and SUN
>>>> JVMs return different byte[] representation for the same sequence of
>>>> characters. If you already spot the problem, please let me know.
>>>>
>>>> BTW, the test calls _TestUtil.getRandomMultiplier on every iteration loop,
>>>> which goes and checks a system property. Perhaps we can extract it to a
>>>> variable, or include a static constant in LuceneTestCase(J4) or something?
>>>>
>>>> Shai
>>>>
>>>> On Mon, Jul 26, 2010 at 9:22 PM, Robert Muir <rcm...@gmail.com> wrote:
>>>>>
>>>>> maybe there is a bug in ibm's random generator :)
>>>>>
>>>>> On Mon, Jul 26, 2010 at 11:50 AM, Michael McCandless
>>>>> <luc...@mikemccandless.com> wrote:
>>>>>>
>>>>>> That's VERY spooky that w/ a fixed seed you see different random
>>>>>> regexps being made.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Mon, Jul 26, 2010 at 11:40 AM, Shai Erera <ser...@gmail.com> wrote:
>>>>>> > Ok I've dug deeper into the test. I set the random seed to
>>>>>> > -9029631602016965389L in setUp(), and discovered that on the 4th
>>>>>> > iteration
>>>>>> > it breaks. For some reason though, AutomatonTestUtil.randomRegex
>>>>>> > generates
>>>>>> > different strings every time I run the test, even though it uses the
>>>>>> > same
>>>>>> > Random object w/ the same seed ...
>>>>>> >
>>>>>> > Anyway, one of the regex that failed was this "l.E" (w/o the quotes)
>>>>>> > and I
>>>>>> > think it's a lowercase L, '.' (dot) and 'E' (uppercase). Hope this
>>>>>> > helps.
>>>>>> >
>>>>>> > Shai
>>>>>> >
>>>>>> > On Mon, Jul 26, 2010 at 6:23 PM, Robert Muir <rcm...@gmail.com> wrote:
>>>>>> >>
>>>>>> >> sounds nasty... its good you are running the tests with this
>>>>>> >> different
>>>>>> >> jvm...
>>>>>> >>
>>>>>> >> On Mon, Jul 26, 2010 at 11:21 AM, Shai Erera <ser...@gmail.com>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Tried to run it w/ SUN JRE6 and it succeeds ! I've tried several
>>>>>> >>> times
>>>>>> >>> and it succeeds every time. However, when I revert back to IBM's, it
>>>>>> >>> fail
>>>>>> >>> immediately.
>>>>>> >>>
>>>>>> >>> I can help w/ the debug, if you give me a hint where to look :).
>>>>>> >>>
>>>>>> >>> Shai
>>>>>> >>>
>>>>>> >>> On Mon, Jul 26, 2010 at 5:57 PM, Shai Erera <ser...@gmail.com>
>>>>>> >>> wrote:
>>>>>> >>>>
>>>>>> >>>> Sorry for the delayed response.
>>>>>> >>>>
>>>>>> >>>> I ran it a couple more times, from Eclipse and Ant, and each time
>>>>>> >>>> it
>>>>>> >>>> fails (amazing !), w/ different seeds. More seeds that fail:
>>>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>>>> >>>> -4244174191361080127
>>>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>>>> >>>> -7059086272401721644
>>>>>> >>>> NOTE: random seed of testcase 'testRandomRegexes' was:
>>>>>> >>>> -1314734215611104147
>>>>>> >>>>
>>>>>> >>>> I use IBM JVM, tried w/ both 1.5 and 1.6 ...
>>>>>> >>>>
>>>>>> >>>> Mike, can we use LUCENE-2565 to track this, or would you prefer
>>>>>> >>>> that I
>>>>>> >>>> open a separate one?
>>>>>> >>>>
>>>>>> >>>> Shai
>>>>>> >>>>
>>>>>> >>>> On Mon, Jul 26, 2010 at 3:26 PM, Michael McCandless
>>>>>> >>>> <luc...@mikemccandless.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> On a more general note...
>>>>>> >>>>>
>>>>>> >>>>> Any time any of you out there hit an "odd" test failure, please
>>>>>> >>>>> please
>>>>>> >>>>> please do just what Shai did: take it to the dev list!
>>>>>> >>>>>
>>>>>> >>>>> Think of Lucene's unit tests like SETI :)  We are desperately
>>>>>> >>>>> seeking
>>>>>> >>>>> bugs, and you and your machine may just be lucky enough to find
>>>>>> >>>>> one...
>>>>>> >>>>> go forth and buy expensive new power hungry computers just so you
>>>>>> >>>>> can
>>>>>> >>>>> run the random tests over and over, seeking the bugs!
>>>>>> >>>>>
>>>>>> >>>>> But be sure to include that random seed when you do hit a
>>>>>> >>>>> failure...
>>>>>> >>>>>
>>>>>> >>>>> Mike
>>>>>> >>>>>
>>>>>> >>>>> On Mon, Jul 26, 2010 at 8:23 AM, Robert Muir <rcm...@gmail.com>
>>>>>> >>>>> wrote:
>>>>>> >>>>> > I agree, Shai can you open a bug? I cannot reproduce, did you
>>>>>> >>>>> > use an
>>>>>> >>>>> > IBM JVM
>>>>>> >>>>> > or another environment that might help us figure it out?
>>>>>> >>>>> >
>>>>>> >>>>> > On Mon, Jul 26, 2010 at 6:29 AM, Michael McCandless
>>>>>> >>>>> > <luc...@mikemccandless.com> wrote:
>>>>>> >>>>> >>
>>>>>> >>>>> >> Hmmm this means a bug is lurking.  This is the power of random
>>>>>> >>>>> >> testing
>>>>>> >>>>> >> (that every time we all run tests, we're testing different
>>>>>> >>>>> >> "paths"
>>>>>> >>>>> >> through the code)....
>>>>>> >>>>> >>
>>>>>> >>>>> >> It seems exceptionally unlikely that LUCENE-2537's changes
>>>>>> >>>>> >> would
>>>>>> >>>>> >> cause
>>>>>> >>>>> >> this!
>>>>>> >>>>> >>
>>>>>> >>>>> >> But, unfortunately, when I plug that seed in I don't see it
>>>>>> >>>>> >> fail,
>>>>>> >>>>> >> which is odd.  I'll run a stress test to see if I can tickle
>>>>>> >>>>> >> the
>>>>>> >>>>> >> bug... can you open a Jira issue so we don't lose track?
>>>>>> >>>>> >>
>>>>>> >>>>> >> Mike
>>>>>> >>>>> >>
>>>>>> >>>>> >> On Mon, Jul 26, 2010 at 2:57 AM, Shai Erera <ser...@gmail.com>
>>>>>> >>>>> >> wrote:
>>>>>> >>>>> >> > Hi
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > I was running tests on trunk (after merging the changes from
>>>>>> >>>>> >> > LUCENE-2537)
>>>>>> >>>>> >> > and received this error message:
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > expected:<true> but was:<false>
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > junit.framework.AssertionFailedError: expected: but was:
>>>>>> >>>>> >> > at
>>>>>> >>>>> >> >
>>>>>> >>>>> >> >
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > org.apache.lucene.util.automaton.TestUTF32ToUTF8.assertAutomaton(TestUTF32ToUTF8.java:197)
>>>>>> >>>>> >> > at
>>>>>> >>>>> >> >
>>>>>> >>>>> >> >
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > org.apache.lucene.util.automaton.TestUTF32ToUTF8.testRandomRegexes(TestUTF32ToUTF8.java:170)
>>>>>> >>>>> >> > at
>>>>>> >>>>> >> >
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:285)
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > NOTE: random seed of testcase 'testRandomRegexes' was:
>>>>>> >>>>> >> > 3510820306304573866
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > I'm sure it's related to my changes. Has anyone else seen
>>>>>> >>>>> >> > this
>>>>>> >>>>> >> > before?
>>>>>> >>>>> >> >
>>>>>> >>>>> >> > Shai
>>>>>> >>>>> >> >
>>>>>> >>>>> >>
>>>>>> >>>>> >>
>>>>>> >>>>> >>
>>>>>> >>>>> >> ---------------------------------------------------------------------
>>>>>> >>>>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> >>>>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>> >>>>> >>
>>>>>> >>>>> >
>>>>>> >>>>> >
>>>>>> >>>>> >
>>>>>> >>>>> > --
>>>>>> >>>>> > Robert Muir
>>>>>> >>>>> > rcm...@gmail.com
>>>>>> >>>>> >
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> ---------------------------------------------------------------------
>>>>>> >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Robert Muir
>>>>>> >> rcm...@gmail.com
>>>>>> >
>>>>>> >
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> rcm...@gmail.com
>>>>
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to