[
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784997#comment-16784997
]
Hoss Man commented on LUCENE-8692:
----------------------------------
bq. (see also my nocommit comments about the existing tragicEvent() call in
prepareCommitInternal() ... but that hadn't triggered any failures in the test
so I hadn't touched it)
I spoke too soon -- beasting just turned up this interesting little situation...
{noformat}
hossman@tray:~/lucene/dev/lucene/core [master] $ ant beast -Dbeast.iters=100
-Dtests.iters=100 -Dtestcase=TestStressIndexing2
-Dtests.method=testRandomCorruptionIsTragic\*
...
[beaster] Beast round 34 results:
/home/hossman/lucene/dev/lucene/build/core/test/34
[beaster] The following error occurred while executing this line:
[beaster] /home/hossman/lucene/dev/lucene/common-build.xml:1572: The
following error occurred while executing this line:
[beaster] /home/hossman/lucene/dev/lucene/common-build.xml:1099: There were
test failures: 1 suite, 100 tests, 1 failure [seed: CABE666E4674CFB2]
[beaster] Executing 1 suite with 1 JVM.
[beaster]
[beaster] Started J0 PID(10111@localhost).
[beaster] 2> NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2
-Dtests.method=testRandomCorruptionIsTragic -Dtests.seed=CABE666E4674CFB2
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=cs
-Dtests.timezone=America/Nipigon -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[beaster] [15:50:16.736] FAILURE 0.02s |
TestStressIndexing2.testRandomCorruptionIsTragic
{seed=[CABE666E4674CFB2:682DC0F2BA2A235F]} <<<
[beaster] > Throwable #1: java.lang.AssertionError: index update
encountered throwable, but no tragic event recorded: java.lang.AssertionError
[beaster] > at
__randomizedtesting.SeedInfo.seed([CABE666E4674CFB2:682DC0F2BA2A235F]:0)
[beaster] > at org.junit.Assert.fail(Assert.java:88)
[beaster] > at org.junit.Assert.assertTrue(Assert.java:41)
[beaster] > at org.junit.Assert.assertNotNull(Assert.java:712)
[beaster] > at
org.apache.lucene.index.TestStressIndexing2$CorruptibleIndexingThread.run(TestStressIndexing2.java:1019)
[beaster] > at
org.apache.lucene.index.TestStressIndexing2.testRandomCorruptionIsTragic(TestStressIndexing2.java:144)
[beaster] > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
Source)
[beaster] > at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[beaster] > at java.lang.reflect.Method.invoke(Method.java:498)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
[beaster] > at
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
[beaster] > at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
[beaster] > at
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
[beaster] > at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
[beaster] > at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[beaster] > at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
[beaster] > at
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
[beaster] > at
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
[beaster] > at
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
[beaster] > at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[beaster] > at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[beaster] > at
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
[beaster] > at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
[beaster] > at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
[beaster] > at
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
[beaster] > at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[beaster] > at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
[beaster] > at java.lang.Thread.run(Thread.java:748)
[beaster] > Suppressed: java.lang.AssertionError
[beaster] > at
org.apache.lucene.codecs.simpletext.SimpleTextDocValuesReader.getNumericNonIterator(SimpleTextDocValuesReader.java:184)
[beaster] > at
org.apache.lucene.codecs.simpletext.SimpleTextDocValuesReader.getNumeric(SimpleTextDocValuesReader.java:142)
[beaster] > at
org.apache.lucene.index.CodecReader.getNumericDocValues(CodecReader.java:137)
[beaster] > at
org.apache.lucene.index.ReadersAndUpdates$2.getNumeric(ReadersAndUpdates.java:373)
[beaster] > at
org.apache.lucene.codecs.simpletext.SimpleTextDocValuesWriter.addNumericField(SimpleTextDocValuesWriter.java:88)
[beaster] > at
org.apache.lucene.index.ReadersAndUpdates.handleDVUpdates(ReadersAndUpdates.java:368)
[beaster] > at
org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:570)
[beaster] > at
org.apache.lucene.index.ReaderPool.commit(ReaderPool.java:325)
[beaster] > at
org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3313)
[beaster] > at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3222)
[beaster] > at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3451)
[beaster] > at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3416)
[beaster] > at
org.apache.lucene.index.TestStressIndexing2$CorruptibleIndexingThread.run(TestStressIndexing2.java:1008)
[beaster] > ... 36 more
[beaster] 2> NOTE: leaving temporary files on disk at:
/home/hossman/lucene/dev/lucene/build/core/test/J0/temp/lucene.index.TestStressIndexing2_CABE666E4674CFB2-001
[beaster] 2> NOTE: test params are: codec=SimpleText,
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@40b1325d),
locale=cs, timezone=America/Nipigon
[beaster] 2> NOTE: Linux 3.19.0-84-generic amd64/Oracle Corporation
1.8.0_144 (64-bit)/cpus=4,threads=1,free=203261928,total=249561088
[beaster] 2> NOTE: All tests run in this JVM: [TestStressIndexing2]
[beaster]
[beaster] Tests with failures [seed: CABE666E4674CFB2]:
[beaster] -
org.apache.lucene.index.TestStressIndexing2.testRandomCorruptionIsTragic
{seed=[CABE666E4674CFB2:682DC0F2BA2A235F]}
{noformat}
...I'm not sure how/why that assertion would have tripped let alone if/when
AssertionErrors should be treated as tragic?
> IndexWriter.getTragicException() nay not reflect all corrupting exceptions
> (notably: NoSuchFileException)
> ---------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Hoss Man
> Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch,
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's
> {{corruptFiles}} to introduce corruption into the "leader" node's index and
> then assert that this solr node gives up it's leadership of the shard and
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly -
> seeSOLR-13237) due to the leader not giving up it's leadership even after the
> corruption causes an update/commit to fail. Solr's leadership code makes
> this decision after encountering an exception from the IndexWriter based on
> wether {{IndexWriter.getTragicException()}} is (non-)null.
> ----
> While investigating this, I created an isolated Lucene-Core equivilent test
> that demonstrates the same basic situation:
> * Gradually cause corruption on an index untill (otherwise) valid execution
> of IW.add() + IW.commit() calls throw an exception to the IW client.
> * assert that if an exception is thrown to the IW client,
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly -- in every situation
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the
> randomly introduced corruption was to delete some file).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]