[ 
https://issues.apache.org/jira/browse/SOLR-12524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593273#comment-16593273
 ] 

Amrit Sarkar edited comment on SOLR-12524 at 8/27/18 7:47 AM:
--------------------------------------------------------------

Attached updated new patch and following is the explanation; please correct me 
if I am wrong.

There is single assertion in CdcrUpdateLog failing after SOLR-9922 which is 
strictly non-harmful. Since we get rid of {{TransactionLog:snapshot()}} and 
{{TransactionLog:rollback()}} functions, Cdcr buffer updates functionalities 
got bit altered in terms of {{recoveryInfo.positionOfStart}}.
 In the function {{CdcrUpdateLog:forwardSeek}}:
 tlogs are de-referenced whose entries are forwarded to target. the assertion:
{code:java}
      assert this.tlogs.peekLast().id == subReader.tlogs.peekLast().id : 
this.tlogs.peekLast().id+" != "+subReader.tlogs.peekLast().id;
{code}
validates that we have purged all tlogs which we don't want to keep anymore 
(been forwarded); subReader is mainTlogReader itself. However after SOLR-9922, 
since tlogs are no longer buffered the matter it was before when cores are in 
recovery (please correct me as I don't understand every nuance of tlog very 
well);

{{this.tlogs.peekLast().id}} can be greater than 
{{subReader.tlogs.peekLast().id}}, which means all useless tlogs are already 
purged, {{forwardSeek}} doesn't have to do anything, which is fine as long as 
no updates are missed.
 If we change the assertion to:
{{this.tlogs.peekLast().id >= subReader.tlogs.peekLast().id}} meaning
 this while loop in {{forwardSeek}} won't be executed rightfully;
{code:java}
      while (this.tlogs.peekLast().id < subReader.tlogs.peekLast().id) {
        tlogs.removeLast();
        currentTlog = tlogs.peekLast();
      }
{code}
everything is fine, all tests passed.
 I am ran 300 round beasts with this assertion on {{CdcrBidirectionalTest}} and 
all good.

I missed this bug in SOLR-9922, as I didn't expect the above scenario to 
happen, even if it is happening, its legit and ok.


was (Author: [email protected]):
Attached updated new patch and following is the explanation; please correct me 
if I am wrong.

There is single assertion in CdcrUpdateLog failing after SOLR-9922 which is 
strictly non-harmful. Since we get rid of TransactionLog:snapshot() and 
TransactionLog:rollback() functions, Cdcr buffer updates functionalities got 
bit altered in terms of recoveryInfo.positionOfStart.
In the function {{CdcrUpdateLog:forwardSeek}}:
tlogs are de-referenced whose entries are forwarded to target. the assertion:
{code}
      assert this.tlogs.peekLast().id == subReader.tlogs.peekLast().id : 
this.tlogs.peekLast().id+" != "+subReader.tlogs.peekLast().id;
{code}
validates that we have purged all tlogs which we don't want to keep anymore 
(been forwarded); subReader is mainTlogReader itself. However after SOLR-9922, 
since tlogs are no longer buffered the matter it was before when cores are in 
recovery (please correct me as I don't understand every nuance of tlog very 
well);

this.tlogs.peekLast().id can be greater than subReader.tlogs.peekLast().id, 
which means all useless tlogs are already purged, `forwardSeek` doesn't have to 
do anything, which is fine as long as no updates are missed.
If we change the assertion to:
this.tlogs.peekLast().id >= subReader.tlogs.peekLast().id meaning
this while loop in `forwardSeek` won't be executed rightfully;
{code}
      while (this.tlogs.peekLast().id < subReader.tlogs.peekLast().id) {
        tlogs.removeLast();
        currentTlog = tlogs.peekLast();
      }
{code}
everything is fine, all tests passed.
I am ran 300 round beasts with this assertion on CdcrBidirectionalTest and all 
good.

I missed this bug in SOLR-9922, as I didn't expect the above scenario to 
happen, even if it is happening, its legit and ok.

> CdcrBidirectionalTest.testBiDir() regularly fails
> -------------------------------------------------
>
>                 Key: SOLR-12524
>                 URL: https://issues.apache.org/jira/browse/SOLR-12524
>             Project: Solr
>          Issue Type: Test
>          Components: CDCR, Tests
>            Reporter: Christine Poerschke
>            Priority: Major
>         Attachments: SOLR-12524.patch, SOLR-12524.patch, SOLR-12524.patch, 
> SOLR-12524.patch, SOLR-12524.patch, beast-test-run
>
>
> e.g. from 
> https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/4701/consoleText
> {code}
> [junit4] ERROR   20.4s J0 | CdcrBidirectionalTest.testBiDir <<<
> [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=28371, 
> name=cdcr-replicator-11775-thread-1, state=RUNNABLE, 
> group=TGRP-CdcrBidirectionalTest]
> [junit4]    >         at 
> __randomizedtesting.SeedInfo.seed([CA5584AC7009CD50:8F8E744E68278112]:0)
> [junit4]    > Caused by: java.lang.AssertionError
> [junit4]    >         at 
> __randomizedtesting.SeedInfo.seed([CA5584AC7009CD50]:0)
> [junit4]    >         at 
> org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.forwardSeek(CdcrUpdateLog.java:611)
> [junit4]    >         at 
> org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:125)
> [junit4]    >         at 
> org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
> [junit4]    >         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> [junit4]    >         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [junit4]    >         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [junit4]    >         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to