[
https://issues.apache.org/jira/browse/SOLR-12524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593273#comment-16593273
]
Amrit Sarkar edited comment on SOLR-12524 at 8/27/18 7:47 AM:
--------------------------------------------------------------
Attached updated new patch and following is the explanation; please correct me
if I am wrong.
There is single assertion in CdcrUpdateLog failing after SOLR-9922 which is
strictly non-harmful. Since we get rid of {{TransactionLog:snapshot()}} and
{{TransactionLog:rollback()}} functions, Cdcr buffer updates functionalities
got bit altered in terms of {{recoveryInfo.positionOfStart}}.
In the function {{CdcrUpdateLog:forwardSeek}}:
tlogs are de-referenced whose entries are forwarded to target. the assertion:
{code:java}
assert this.tlogs.peekLast().id == subReader.tlogs.peekLast().id :
this.tlogs.peekLast().id+" != "+subReader.tlogs.peekLast().id;
{code}
validates that we have purged all tlogs which we don't want to keep anymore
(been forwarded); subReader is mainTlogReader itself. However after SOLR-9922,
since tlogs are no longer buffered the matter it was before when cores are in
recovery (please correct me as I don't understand every nuance of tlog very
well);
{{this.tlogs.peekLast().id}} can be greater than
{{subReader.tlogs.peekLast().id}}, which means all useless tlogs are already
purged, {{forwardSeek}} doesn't have to do anything, which is fine as long as
no updates are missed.
If we change the assertion to:
{{this.tlogs.peekLast().id >= subReader.tlogs.peekLast().id}} meaning
this while loop in {{forwardSeek}} won't be executed rightfully;
{code:java}
while (this.tlogs.peekLast().id < subReader.tlogs.peekLast().id) {
tlogs.removeLast();
currentTlog = tlogs.peekLast();
}
{code}
everything is fine, all tests passed.
I am ran 300 round beasts with this assertion on {{CdcrBidirectionalTest}} and
all good.
I missed this bug in SOLR-9922, as I didn't expect the above scenario to
happen, even if it is happening, its legit and ok.
was (Author: [email protected]):
Attached updated new patch and following is the explanation; please correct me
if I am wrong.
There is single assertion in CdcrUpdateLog failing after SOLR-9922 which is
strictly non-harmful. Since we get rid of TransactionLog:snapshot() and
TransactionLog:rollback() functions, Cdcr buffer updates functionalities got
bit altered in terms of recoveryInfo.positionOfStart.
In the function {{CdcrUpdateLog:forwardSeek}}:
tlogs are de-referenced whose entries are forwarded to target. the assertion:
{code}
assert this.tlogs.peekLast().id == subReader.tlogs.peekLast().id :
this.tlogs.peekLast().id+" != "+subReader.tlogs.peekLast().id;
{code}
validates that we have purged all tlogs which we don't want to keep anymore
(been forwarded); subReader is mainTlogReader itself. However after SOLR-9922,
since tlogs are no longer buffered the matter it was before when cores are in
recovery (please correct me as I don't understand every nuance of tlog very
well);
this.tlogs.peekLast().id can be greater than subReader.tlogs.peekLast().id,
which means all useless tlogs are already purged, `forwardSeek` doesn't have to
do anything, which is fine as long as no updates are missed.
If we change the assertion to:
this.tlogs.peekLast().id >= subReader.tlogs.peekLast().id meaning
this while loop in `forwardSeek` won't be executed rightfully;
{code}
while (this.tlogs.peekLast().id < subReader.tlogs.peekLast().id) {
tlogs.removeLast();
currentTlog = tlogs.peekLast();
}
{code}
everything is fine, all tests passed.
I am ran 300 round beasts with this assertion on CdcrBidirectionalTest and all
good.
I missed this bug in SOLR-9922, as I didn't expect the above scenario to
happen, even if it is happening, its legit and ok.
> CdcrBidirectionalTest.testBiDir() regularly fails
> -------------------------------------------------
>
> Key: SOLR-12524
> URL: https://issues.apache.org/jira/browse/SOLR-12524
> Project: Solr
> Issue Type: Test
> Components: CDCR, Tests
> Reporter: Christine Poerschke
> Priority: Major
> Attachments: SOLR-12524.patch, SOLR-12524.patch, SOLR-12524.patch,
> SOLR-12524.patch, SOLR-12524.patch, beast-test-run
>
>
> e.g. from
> https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/4701/consoleText
> {code}
> [junit4] ERROR 20.4s J0 | CdcrBidirectionalTest.testBiDir <<<
> [junit4] > Throwable #1:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an
> uncaught exception in thread: Thread[id=28371,
> name=cdcr-replicator-11775-thread-1, state=RUNNABLE,
> group=TGRP-CdcrBidirectionalTest]
> [junit4] > at
> __randomizedtesting.SeedInfo.seed([CA5584AC7009CD50:8F8E744E68278112]:0)
> [junit4] > Caused by: java.lang.AssertionError
> [junit4] > at
> __randomizedtesting.SeedInfo.seed([CA5584AC7009CD50]:0)
> [junit4] > at
> org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.forwardSeek(CdcrUpdateLog.java:611)
> [junit4] > at
> org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:125)
> [junit4] > at
> org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
> [junit4] > at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> [junit4] > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [junit4] > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [junit4] > at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]