[ 
https://issues.apache.org/jira/browse/SOLR-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993852#comment-16993852
 ] 

Yonik Seeley commented on SOLR-14058:
-------------------------------------

Here's our stacktrace:

{code}
2019-12-11 19:16:35.028 ERROR (zkCallback-10-thread-29) 
[c:cre_records_cre300000000B6j_0.1.4_0 s:shard2 r:core_node13 
x:cre_records_cre300000000B6j_0.1.4_0_shard2_replica_n10] o.a.s.c.SyncStrategy 
Sync Failed:java.lang.IndexOutOfBoundsException: Index -1 out of bounds for 
length 32
        at 
java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
        at 
java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
        at 
java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
        at java.base/java.util.Objects.checkIndex(Objects.java:372)
        at java.base/java.util.ArrayList.get(ArrayList.java:458)
        at 
org.apache.solr.update.PeerSync$MissedUpdatesFinderBase.handleVersionsWithRanges(PeerSync.java:750)
        at 
org.apache.solr.update.PeerSync$MissedUpdatesFinder.find(PeerSync.java:839)
        at org.apache.solr.update.PeerSync.handleVersions(PeerSync.java:439)
        at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:373)
        at org.apache.solr.update.PeerSync.sync(PeerSync.java:226)
        at 
org.apache.solr.cloud.SyncStrategy.syncWithReplicas(SyncStrategy.java:187)
        at 
org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:131)
        at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:109)
        at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:400)
        at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:172)
        at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:137)
        at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:57)
        at 
org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:350)
        at 
org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.lambda$process$1(SolrZkClient.java:845)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
{code}

> AIOOBE in PeerSync
> ------------------
>
>                 Key: SOLR-14058
>                 URL: https://issues.apache.org/jira/browse/SOLR-14058
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.3
>            Reporter: Yonik Seeley
>            Priority: Major
>
> We hit an exception with 8.3 that someone else also hit on stackoverflow:
> https://stackoverflow.com/questions/58891563/problem-in-syncing-replicas-with-solr-8-3-with-zookeeper-3-5-6
> {quote}
> I recently converted a solr 7.x + zookeeper 3.4.14 to solr 8.3 + zk 3.5.6, 
> and depending on how I start the solr nodes I'm geting a sync exception.
> My setup uses 3 zk nodes and 2 solr nodes (let's call it A and B). The 
> collection that has this problem has 1 shard and 2 replicas. I've noticed 2 
> situations: (1) which works fine and (2) which does not work.
> 1) This works: I start solr node A, and wait until it's replica is elected 
> leader ("green" in the Solr interface 'Cloud'->'Graph') - which takes about 2 
> min; and only then start solr node B. Both replicas are active and the one in 
> A is the leader.
> 2) This does NOT work: I start solr node A, and a few secs after I star solr 
> node B (that is, before the 'A' replica is elected leader - still "Down" in 
> the solr interface). In this case I get the following exception:
> ERROR (coreZkRegister-1-thread-2-processing-n:192.168.15.20:8986_solr 
> x:alldata_shard1_replica_n1 c:alldata s:shard1 r:core_node3) [c:alldata 
> s:shard1 r:core_node3 x:alldata_shard1_replica_n1] o.a.s.c.SyncStrategy Sync 
> Failed:java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 
> 99
> It seems that if both solr node are started soon after each other, then ZK 
> cannot elect one as leader. This error only appears in the solr.log of node 
> A, even if I invert the order of starting nodes.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to