The matching rule is on lines 460-491 in HTMLStripCharFilter.jflex: <
https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.jflex?view=markup#l460>
- it matches an overly-broad set of char ref pairs, then validates
correctly paired surrogates, and backtracks if the pair are not valid.

The JDK methods are Integer.parseInt(), Character.isHighSurrogate() and
Character.isLowSurrogate().

On Sun, Nov 23, 2014 at 1:06 PM, Robert Muir <[email protected]> wrote:

> Is the character processing here all done by the charfilter, or does
> it use some encoding methods from the JDK?
>
> when i looked at it, it looked like a jvm bug.
>
> On Sun, Nov 23, 2014 at 1:04 PM, Steve Rowe <[email protected]> wrote:
> > This is the same line in the same test that failed on Windows under a
> > 1.8.0_20 JVM five days ago
> > <http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/>, but
> in a
> > different way.
> >
> > This test's input is the string "&#55404;&#57999;" - HTML character
> > references for U+D86D U+E28F - and the expected output is the char
> sequence
> > U+FFFD U+E28F (the Unicode replacement character followed by the second
> > input char).
> >
> > In the Windows failure, the output was U+D86D U+E28F (improperly paired
> high
> > surrogate).
> >
> > In this Linux failure, the output is U+2B68F (properly paired UTF-16
> U+D86D
> > U+DE8F).
> >
> > Very weird.
> >
> > I'm beasting this suite now on Windows under Oracle JVM 1.8.0_20 to see
> if I
> > can get it to fail.  No dice so far after 140 trials.
> >
> >
> > On Sun, Nov 23, 2014 at 6:19 AM, Policeman Jenkins Server
> > <[email protected]> wrote:
> >>
> >> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11492/
> >> Java: 32bit/jdk1.8.0_20 -server -XX:+UseParallelGC (asserts: false)
> >>
> >> 1 tests failed.
> >> FAILED:
> >>
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates
> >>
> >> Error Message:
> >> term 0 expected:<[�]> but was:<[𫚏]>
> >>
> >> Stack Trace:
> >> org.junit.ComparisonFailure: term 0 expected:<[�]> but was:<[𫚏]>
> >>         at
> >> __randomizedtesting.SeedInfo.seed([CF8F65E969B602B9:93CFDF3CEB58ED83]:0)
> >>         at org.junit.Assert.assertEquals(Assert.java:125)
> >>         at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:180)
> >>         at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:295)
> >>         at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:299)
> >>         at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:303)
> >>         at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:353)
> >>         at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
> >>         at
> >>
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
> >>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>         at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>         at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>         at java.lang.reflect.Method.invoke(Method.java:483)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> >>         at
> >>
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> >>         at
> >>
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> >>         at
> >>
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> >>         at
> >>
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> >>         at
> >>
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> >>         at
> >>
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> >>         at
> >>
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >>         at
> >>
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
> >>         at
> >>
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> >>         at
> >>
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> >>         at
> >>
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >>         at
> >>
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> >>         at java.lang.Thread.run(Thread.java:745)
> >>
> >>
> >>
> >>
> >> Build Log:
> >> [...truncated 5753 lines...]
> >>    [junit4] Suite:
> >> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest
> >>    [junit4]   2> NOTE: reproduce with: ant test
> >> -Dtestcase=HTMLStripCharFilterTest -Dtests.method=testUTF16Surrogates
> >> -Dtests.seed=CF8F65E969B602B9 -Dtests.multiplier=3 -Dtests.slow=true
> >> -Dtests.locale=th_TH -Dtests.timezone=PLT -Dtests.asserts=false
> >> -Dtests.file.encoding=UTF-8
> >>    [junit4] FAILURE 0.07s J0 |
> HTMLStripCharFilterTest.testUTF16Surrogates
> >> <<<
> >>    [junit4]    > Throwable #1: org.junit.ComparisonFailure: term 0
> >> expected:<[�]> but was:<[𫚏]>
> >>    [junit4]    >        at
> >> __randomizedtesting.SeedInfo.seed([CF8F65E969B602B9:93CFDF3CEB58ED83]:0)
> >>    [junit4]    >        at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:180)
> >>    [junit4]    >        at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:295)
> >>    [junit4]    >        at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:299)
> >>    [junit4]    >        at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:303)
> >>    [junit4]    >        at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:353)
> >>    [junit4]    >        at
> >>
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
> >>    [junit4]    >        at
> >>
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
> >>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
> >>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene50):
> >> {dummy=BlockTreeOrds(blocksize=128)}, docValues:{},
> sim=DefaultSimilarity,
> >> locale=th_TH, timezone=PLT
> >>    [junit4]   2> NOTE: Linux 3.13.0-39-generic i386/Oracle Corporation
> >> 1.8.0_20 (32-bit)/cpus=8,threads=1,free=88329216,total=222035968
> >>    [junit4]   2> NOTE: All tests run in this JVM:
> >> [TestPatternReplaceCharFilter, TestArabicNormalizationFilter,
> >> TestPatternReplaceCharFilterFactory, TestWikipediaTokenizerFactory,
> >> TestCondition2, TestIrishLowerCaseFilterFactory, TestGalicianStemFilter,
> >> TestWordlistLoader, TestElisionFilterFactory, TestLengthFilter,
> >> TestGermanLightStemFilterFactory, EdgeNGramTokenFilterTest,
> >> TestSerbianNormalizationFilterFactory, TestPortugueseLightStemFilter,
> >> TestSwedishLightStemFilterFactory, TestPatternReplaceFilterFactory,
> >> TestElision, TestCzechStemFilterFactory, TestSpanishLightStemFilter,
> >> TestSingleTokenTokenFilter, TestHindiStemmer, TestKeepWordFilter,
> >> TestLimitTokenCountFilter, TestShingleFilterFactory, TestTrimFilter,
> >> TestCapitalizationFilterFactory, TestFactories,
> >> TestGalicianMinimalStemFilterFactory, TestFlagLong, TestIgnore,
> >> TestGermanMinimalStemFilterFactory, TestUAX29URLEmailTokenizerFactory,
> >> TestPatternCaptureGroupTokenFilter, TestAlternateCasing,
> TestCzechAnalyzer,
> >> TestOnlyInCompound, TestPersianNormalizationFilter,
> >> TestGermanNormalizationFilterFactory, WikipediaTokenizerTest,
> >> TestMultiWordSynonyms, TestTruncateTokenFilter, TestPersianAnalyzer,
> >> TestArabicAnalyzer, TestRemoveDuplicatesTokenFilter,
> >> TestSoraniStemFilterFactory, TestPorterStemFilterFactory,
> >> TestCodepointCountFilterFactory, TokenTypeSinkTokenizerTest,
> >> TestSoraniAnalyzer, TestApostropheFilter, QueryAutoStopWordAnalyzerTest,
> >> TestTwoSuffixes, TestScandinavianFoldingFilterFactory,
> TestArmenianAnalyzer,
> >> TestFinnishAnalyzer, TestFlagNum, TestIndonesianStemmer,
> >> TestLimitTokenCountAnalyzer, TestScandinavianNormalizationFilterFactory,
> >> TestReversePathHierarchyTokenizer, TestGalicianMinimalStemFilter,
> >> TestPersianNormalizationFilterFactory, TestNeedAffix,
> >> TestGermanLightStemFilter, TestLimitTokenPositionFilterFactory,
> >> TestStopFilterFactory, TestMappingCharFilter, HTMLStripCharFilterTest]
> >>    [junit4] Completed on J0 in 2.12s, 31 tests, 1 failure <<< FAILURES!
> >>
> >> [...truncated 403 lines...]
> >> BUILD FAILED
> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:525: The
> >> following error occurred while executing this line:
> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:473: The
> >> following error occurred while executing this line:
> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:61: The
> >> following error occurred while executing this line:
> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/extra-targets.xml:39:
> The
> >> following error occurred while executing this line:
> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/build.xml:452:
> The
> >> following error occurred while executing this line:
> >>
> >>
> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:2141:
> >> The following error occurred while executing this line:
> >>
> >>
> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/analysis/build.xml:106:
> >> The following error occurred while executing this line:
> >>
> >>
> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/analysis/build.xml:38:
> >> The following error occurred while executing this line:
> >>
> >>
> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/module-build.xml:58:
> >> The following error occurred while executing this line:
> >>
> >>
> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:1359:
> >> The following error occurred while executing this line:
> >>
> >>
> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:966:
> >> There were test failures: 270 suites, 1408 tests, 1 failure, 1 ignored
> >>
> >> Total time: 30 minutes 5 seconds
> >> Build step 'Invoke Ant' marked build as failure
> >> [description-setter] Description set: Java: 32bit/jdk1.8.0_20 -server
> >> -XX:+UseParallelGC (asserts: false)
> >> Archiving artifacts
> >> Recording test results
> >> Email was triggered for: Failure - Any
> >> Sending email for trigger: Failure - Any
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to