[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788105#comment-16788105
 ] 

Richard commented on SOLR-13240:
--------------------------------

I have come across this same problem, except I notice it when experimenting 
with the autoscaling features, and when Solr is running the ComputePlan. 

I did some debugging of the underlying culprit:
{code:java}
List<Pair<ReplicaInfo, Row>> validReplicas = getValidReplicas(true, true, -1);
for ( Pair<ReplicaInfo, Row> x : validReplicas ) {
  log.warn("Pair: {}", x.first());
}
validReplicas.sort(leaderLast);
{code}

And found the following results, which I find really odd _(I've removed any 
company sensitive information hence the collection names and node urls)_
{code:json}
  {
    "first": {
      "core_node95": {
        "core": "collection_name1_shard8_replica_n92",
        "leader": "true",
        "INDEX.sizeInBytes": 6.426125764846802e-08,
        "base_url": "http://127.0.0.1:8081/solr";,
        "node_name": "127.0.0.1:8081_solr",
        "state": "active",
        "type": "NRT",
        "force_set_state": "false",
        "shard": "shard8",
        "collection": "collection_name1"
      }
    },
    "second": "127.0.0.1:8081_solr"
  },
  {
    "first": {
      "core_node7": {
        "core": "collection_name1_shard12_replica_n4",
        "leader": "true",
        "INDEX.sizeInBytes": 6.426125764846802e-08,
        "base_url": "http://127.0.0.1:8081/solr";,
        "node_name": "127.0.0.1:8081_solr",
        "state": "active",
        "type": "NRT",
        "force_set_state": "false",
        "shard": "shard12",
        "collection": "collection_name1"
      }
    },
    "second": "127.0.0.1:8081_solr"
  },
  {
    "first": {
      "7851000": {
        "type": "NRT",
        "INDEX.sizeInBytes": 6.426125764846802e-08,
        "core": "7851000",
        "shard": "shard2",
        "collection": "collection_name2",
        "node_name": "0.0.0.0:8080_solr"
      }
    },
    "second": "0.0.0.0:8080_solr"
  }
{code}

I've tried to create a simple unit test for the Comparator function that can be 
found in {{MoveReplicaSuggester.java}} _(which is causing this problem)_, 
however, I am struggling to produce similar results to make it cause an 
exception. I've tried a list of {{replicaInfo}}'s where they're all leaders, no 
one is a leader, all but one is a leader, inverted and alternating. So I'm not 
sure if what's getting the ReplicaInfo is possibly the problem if it's creating 
a {{Pair<ReplicaInfo, Row>}} object with that kind of data _(the last JSON 
blob)_ 

> UTILIZENODE action results in an exception
> ------------------------------------------
>
>                 Key: SOLR-13240
>                 URL: https://issues.apache.org/jira/browse/SOLR-13240
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.6
>            Reporter: Hendrik Haddorp
>            Priority: Major
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
>     "status":500,
>     "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
>     "msg":"Comparison method violates its general contract!",
>     "rspCode":-1},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"Comparison method violates its general contract!",
>     "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>  
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n",
>     "code":500}} 
> The logs show this additional exception:
> 2019-02-10 00:09:00.539 ERROR 
> (OverseerThreadFactory-1268-thread-38-processing-n:agent2:9151_solr) [   ] 
> o.a.s.c.a.c.OverseerCollectionMessageHandler Operation utilizenode 
> failed:java.lang.IllegalArgumentException: Comparison method violates its 
> general contract!
>     at java.util.TimSort.mergeLo(TimSort.java:777)
>     at java.util.TimSort.mergeAt(TimSort.java:514)
>     at java.util.TimSort.mergeCollapse(TimSort.java:439)
>     at java.util.TimSort.sort(TimSort.java:245)
>     at java.util.Arrays.sort(Arrays.java:1512)
>     at java.util.ArrayList.sort(ArrayList.java:1462)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.tryEachNode(MoveReplicaSuggester.java:50)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.init(MoveReplicaSuggester.java:38)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.Suggester.getSuggestion(Suggester.java:187)
>     at 
> org.apache.solr.cloud.api.collections.UtilizeNodeCmd.call(UtilizeNodeCmd.java:100)
>     at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:259)
>     at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:478)
>     at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) 
> I suspect this to be caused by this comparator: 
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/MoveReplicaSuggester.java#L98
> In case it is possible that both compared replicas are leaders the result 
> would not be correct.
> see also the mail thread about this:
> https://www.mail-archive.com/[email protected]&q=subject:%22Re%5C%3A+CloudSolrClient+getDocCollection%22&o=newest&f=1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to