[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150114#comment-16150114
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:41 AM:
-------------------------------------------------------------

Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current 
operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
      Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
      boolean locked = recoveryLock.tryLock();
      SolrCoreState coreState = core.getSolrCoreState();
      try {
        if (!locked)  {
          log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + 
locked);
          handleCancelBootstrap(req, rsp);
        } else if (leaderStateManager.amILeader())  {
          coreState.setCdcrBootstrapRunning(true);
          //running.set(true);
          String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
          BootstrapCallable bootstrapCallable = new 
BootstrapCallable(masterUrl, core);
          coreState.setCdcrBootstrapCallable(bootstrapCallable);
          Future<Boolean> bootstrapFuture = 
core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
              .submit(bootstrapCallable);
          try {
            log.info("we reached this point :: all good, bootstrapFuture.get :: 
" + bootstrapFuture.get());
          } catch (Exception e) {
            log.error("bootstrapFuture.get :: ",e);
          }
          coreState.setCdcrBootstrapFuture(bootstrapFuture);
          try {
            bootstrapFuture.get();
          } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            log.warn("Bootstrap was interrupted", e);
          } catch (ExecutionException e) {
            log.error("Bootstrap operation failed", e);
          }
        } else  {
          log.error("Action {} sent to non-leader replica @ {}:{}. Aborting 
bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard);
        }
      } finally {
        if (locked) {
          coreState.setCdcrBootstrapRunning(false);
          recoveryLock.unlock();
        }
      }
    };
{code}

*bootstrapFuture.get()* throws:

{quote}
  [beaster]   2> 43072 ERROR 
(updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation 
failed
  [beaster]   2> java.util.concurrent.ExecutionException: 
java.lang.AssertionError
  [beaster]   2>        at 
java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2>        at 
java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2>        at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2>        at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2>        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2>        at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2>        at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2>        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2>        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2>        at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2>        at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2>        at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2>        at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2>        ... 5 more
{quote}

and bootstrap operation fails.

FutureTask.java ::
{code}
    /**
     * Returns result or throws exception for completed task.
     * @param s completed state value
     */
    @SuppressWarnings("unchecked")
    private V report(int s) throws ExecutionException {
        Object x = outcome;
        if (s == NORMAL)
            return (V)x;
        if (s >= CANCELLED)
            throw new CancellationException();
        throw new ExecutionException((Throwable)x);
    }
{code}

and the assertion failure is at BootstrapCallable call function {{finally}} 
block ::
{code}
        if (closed || !success) {
          // we cannot apply the buffer in this case because it will introduce 
newer versions in the
          // update log and then the source cluster will get those versions via 
collectioncheckpoint
          // causing the versions in between to be completely missed
          boolean dropped = ulog.dropBufferedUpdates();
          assert dropped;
        }
{code}

{{dropped}} is false, {{bufferredUpdates}} are not cleared / dropped?

Earlier the assertion failure is for {{2000 to 1000 or 1001}}, recently I got 
{{2000 to 1100}}.

I will test with disabled buffer and see if there's any change.


was (Author: sarkaramr...@gmail.com):
Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current 
operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
      Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
      boolean locked = recoveryLock.tryLock();
      SolrCoreState coreState = core.getSolrCoreState();
      try {
        if (!locked)  {
          log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + 
locked);
          handleCancelBootstrap(req, rsp);
        } else if (leaderStateManager.amILeader())  {
          coreState.setCdcrBootstrapRunning(true);
          //running.set(true);
          String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
          BootstrapCallable bootstrapCallable = new 
BootstrapCallable(masterUrl, core);
          coreState.setCdcrBootstrapCallable(bootstrapCallable);
          Future<Boolean> bootstrapFuture = 
core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
              .submit(bootstrapCallable);
          try {
            log.info("we reached this point :: all good, bootstrapFuture.get :: 
" + bootstrapFuture.get());
          } catch (Exception e) {
            log.error("bootstrapFuture.get :: ",e);
          }
          coreState.setCdcrBootstrapFuture(bootstrapFuture);
          try {
            bootstrapFuture.get();
          } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            log.warn("Bootstrap was interrupted", e);
          } catch (ExecutionException e) {
            log.error("Bootstrap operation failed", e);
          }
        } else  {
          log.error("Action {} sent to non-leader replica @ {}:{}. Aborting 
bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard);
        }
      } finally {
        if (locked) {
          coreState.setCdcrBootstrapRunning(false);
          recoveryLock.unlock();
        }
      }
    };
{code}

*bootstrapFuture.get()* throws:

{quote}
  [beaster]   2> 43072 ERROR 
(updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation 
failed
  [beaster]   2> java.util.concurrent.ExecutionException: 
java.lang.AssertionError
  [beaster]   2>        at 
java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2>        at 
java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2>        at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2>        at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2>        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2>        at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2>        at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2>        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2>        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2>        at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2>        at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2>        at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2>        at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2>        ... 5 more
{quote}

and bootstrap operation fails.

FutureTask.java ::
{code}
    /**
     * Returns result or throws exception for completed task.
     * @param s completed state value
     */
    @SuppressWarnings("unchecked")
    private V report(int s) throws ExecutionException {
        Object x = outcome;
        if (s == NORMAL)
            return (V)x;
        if (s >= CANCELLED)
            throw new CancellationException();
        throw new ExecutionException((Throwable)x);
    }
{code}

and the assertion failure is at same function {{finally}} block ::
{code}
        if (closed || !success) {
          // we cannot apply the buffer in this case because it will introduce 
newer versions in the
          // update log and then the source cluster will get those versions via 
collectioncheckpoint
          // causing the versions in between to be completely missed
          boolean dropped = ulog.dropBufferedUpdates();
          assert dropped;
        }
{code}

{{dropped}} is false, {{bufferredUpdates}} are not cleared / dropped? I 
understand it is calling its own function but this is difficult to comprehend 
who is calling what and what is getting returned..

> CdcrBootstrapTest failing in branch_6_6
> ---------------------------------------
>
>                 Key: SOLR-11278
>                 URL: https://issues.apache.org/jira/browse/SOLR-11278
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>            Reporter: Amrit Sarkar
>            Assignee: Varun Thacker
>         Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]    > Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to