That's lines 460-497 (not 491) in HTMLStripCharFilter.jflex On Sun, Nov 23, 2014 at 1:24 PM, Steve Rowe <[email protected]> wrote:
> The matching rule is on lines 460-491 in HTMLStripCharFilter.jflex: < > https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.jflex?view=markup#l460> > - it matches an overly-broad set of char ref pairs, then validates > correctly paired surrogates, and backtracks if the pair are not valid. > > The JDK methods are Integer.parseInt(), Character.isHighSurrogate() and > Character.isLowSurrogate(). > > On Sun, Nov 23, 2014 at 1:06 PM, Robert Muir <[email protected]> wrote: > >> Is the character processing here all done by the charfilter, or does >> it use some encoding methods from the JDK? >> >> when i looked at it, it looked like a jvm bug. >> >> On Sun, Nov 23, 2014 at 1:04 PM, Steve Rowe <[email protected]> wrote: >> > This is the same line in the same test that failed on Windows under a >> > 1.8.0_20 JVM five days ago >> > <http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/>, but >> in a >> > different way. >> > >> > This test's input is the string "�" - HTML character >> > references for U+D86D U+E28F - and the expected output is the char >> sequence >> > U+FFFD U+E28F (the Unicode replacement character followed by the second >> > input char). >> > >> > In the Windows failure, the output was U+D86D U+E28F (improperly paired >> high >> > surrogate). >> > >> > In this Linux failure, the output is U+2B68F (properly paired UTF-16 >> U+D86D >> > U+DE8F). >> > >> > Very weird. >> > >> > I'm beasting this suite now on Windows under Oracle JVM 1.8.0_20 to see >> if I >> > can get it to fail. No dice so far after 140 trials. >> > >> > >> > On Sun, Nov 23, 2014 at 6:19 AM, Policeman Jenkins Server >> > <[email protected]> wrote: >> >> >> >> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11492/ >> >> Java: 32bit/jdk1.8.0_20 -server -XX:+UseParallelGC (asserts: false) >> >> >> >> 1 tests failed. >> >> FAILED: >> >> >> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates >> >> >> >> Error Message: >> >> term 0 expected:<[�]> but was:<[𫚏]> >> >> >> >> Stack Trace: >> >> org.junit.ComparisonFailure: term 0 expected:<[�]> but was:<[𫚏]> >> >> at >> >> >> __randomizedtesting.SeedInfo.seed([CF8F65E969B602B9:93CFDF3CEB58ED83]:0) >> >> at org.junit.Assert.assertEquals(Assert.java:125) >> >> at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:180) >> >> at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:295) >> >> at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:299) >> >> at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:303) >> >> at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:353) >> >> at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362) >> >> at >> >> >> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >> at >> >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:483) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) >> >> at >> >> >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >> >> at >> >> >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> >> at >> >> >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) >> >> at >> >> >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) >> >> at >> >> >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> >> at >> >> >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) >> >> at >> >> >> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) >> >> at >> >> >> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) >> >> at >> >> >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) >> >> at >> >> >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> >> at >> >> >> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> >> at >> >> >> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) >> >> at >> >> >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> >> at >> >> >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) >> >> at >> >> >> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) >> >> at >> >> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> >> at >> >> >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> >> >> >> >> >> >> Build Log: >> >> [...truncated 5753 lines...] >> >> [junit4] Suite: >> >> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest >> >> [junit4] 2> NOTE: reproduce with: ant test >> >> -Dtestcase=HTMLStripCharFilterTest -Dtests.method=testUTF16Surrogates >> >> -Dtests.seed=CF8F65E969B602B9 -Dtests.multiplier=3 -Dtests.slow=true >> >> -Dtests.locale=th_TH -Dtests.timezone=PLT -Dtests.asserts=false >> >> -Dtests.file.encoding=UTF-8 >> >> [junit4] FAILURE 0.07s J0 | >> HTMLStripCharFilterTest.testUTF16Surrogates >> >> <<< >> >> [junit4] > Throwable #1: org.junit.ComparisonFailure: term 0 >> >> expected:<[�]> but was:<[𫚏]> >> >> [junit4] > at >> >> >> __randomizedtesting.SeedInfo.seed([CF8F65E969B602B9:93CFDF3CEB58ED83]:0) >> >> [junit4] > at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:180) >> >> [junit4] > at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:295) >> >> [junit4] > at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:299) >> >> [junit4] > at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:303) >> >> [junit4] > at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:353) >> >> [junit4] > at >> >> >> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362) >> >> [junit4] > at >> >> >> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600) >> >> [junit4] > at java.lang.Thread.run(Thread.java:745) >> >> [junit4] 2> NOTE: test params are: codec=Asserting(Lucene50): >> >> {dummy=BlockTreeOrds(blocksize=128)}, docValues:{}, >> sim=DefaultSimilarity, >> >> locale=th_TH, timezone=PLT >> >> [junit4] 2> NOTE: Linux 3.13.0-39-generic i386/Oracle Corporation >> >> 1.8.0_20 (32-bit)/cpus=8,threads=1,free=88329216,total=222035968 >> >> [junit4] 2> NOTE: All tests run in this JVM: >> >> [TestPatternReplaceCharFilter, TestArabicNormalizationFilter, >> >> TestPatternReplaceCharFilterFactory, TestWikipediaTokenizerFactory, >> >> TestCondition2, TestIrishLowerCaseFilterFactory, >> TestGalicianStemFilter, >> >> TestWordlistLoader, TestElisionFilterFactory, TestLengthFilter, >> >> TestGermanLightStemFilterFactory, EdgeNGramTokenFilterTest, >> >> TestSerbianNormalizationFilterFactory, TestPortugueseLightStemFilter, >> >> TestSwedishLightStemFilterFactory, TestPatternReplaceFilterFactory, >> >> TestElision, TestCzechStemFilterFactory, TestSpanishLightStemFilter, >> >> TestSingleTokenTokenFilter, TestHindiStemmer, TestKeepWordFilter, >> >> TestLimitTokenCountFilter, TestShingleFilterFactory, TestTrimFilter, >> >> TestCapitalizationFilterFactory, TestFactories, >> >> TestGalicianMinimalStemFilterFactory, TestFlagLong, TestIgnore, >> >> TestGermanMinimalStemFilterFactory, TestUAX29URLEmailTokenizerFactory, >> >> TestPatternCaptureGroupTokenFilter, TestAlternateCasing, >> TestCzechAnalyzer, >> >> TestOnlyInCompound, TestPersianNormalizationFilter, >> >> TestGermanNormalizationFilterFactory, WikipediaTokenizerTest, >> >> TestMultiWordSynonyms, TestTruncateTokenFilter, TestPersianAnalyzer, >> >> TestArabicAnalyzer, TestRemoveDuplicatesTokenFilter, >> >> TestSoraniStemFilterFactory, TestPorterStemFilterFactory, >> >> TestCodepointCountFilterFactory, TokenTypeSinkTokenizerTest, >> >> TestSoraniAnalyzer, TestApostropheFilter, >> QueryAutoStopWordAnalyzerTest, >> >> TestTwoSuffixes, TestScandinavianFoldingFilterFactory, >> TestArmenianAnalyzer, >> >> TestFinnishAnalyzer, TestFlagNum, TestIndonesianStemmer, >> >> TestLimitTokenCountAnalyzer, >> TestScandinavianNormalizationFilterFactory, >> >> TestReversePathHierarchyTokenizer, TestGalicianMinimalStemFilter, >> >> TestPersianNormalizationFilterFactory, TestNeedAffix, >> >> TestGermanLightStemFilter, TestLimitTokenPositionFilterFactory, >> >> TestStopFilterFactory, TestMappingCharFilter, HTMLStripCharFilterTest] >> >> [junit4] Completed on J0 in 2.12s, 31 tests, 1 failure <<< FAILURES! >> >> >> >> [...truncated 403 lines...] >> >> BUILD FAILED >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:525: The >> >> following error occurred while executing this line: >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:473: The >> >> following error occurred while executing this line: >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:61: The >> >> following error occurred while executing this line: >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/extra-targets.xml:39: >> The >> >> following error occurred while executing this line: >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/build.xml:452: >> The >> >> following error occurred while executing this line: >> >> >> >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:2141: >> >> The following error occurred while executing this line: >> >> >> >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/analysis/build.xml:106: >> >> The following error occurred while executing this line: >> >> >> >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/analysis/build.xml:38: >> >> The following error occurred while executing this line: >> >> >> >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/module-build.xml:58: >> >> The following error occurred while executing this line: >> >> >> >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:1359: >> >> The following error occurred while executing this line: >> >> >> >> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:966: >> >> There were test failures: 270 suites, 1408 tests, 1 failure, 1 ignored >> >> >> >> Total time: 30 minutes 5 seconds >> >> Build step 'Invoke Ant' marked build as failure >> >> [description-setter] Description set: Java: 32bit/jdk1.8.0_20 -server >> >> -XX:+UseParallelGC (asserts: false) >> >> Archiving artifacts >> >> Recording test results >> >> Email was triggered for: Failure - Any >> >> Sending email for trigger: Failure - Any >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
