[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901020#comment-16901020
 ] 

Richard Goodman edited comment on SOLR-13240 at 8/6/19 1:14 PM:
----------------------------------------------------------------

Okay having had a look at this, this is what I understand from it:

With the 
[clusterstate|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.4.0/solr/solrj/src/test/org/apache/solr/client/solrj/cloud/autoscaling/TestPolicy.java#L90-L130]
 that is being loaded, and the 
[policies|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.4.0/solr/solrj/src/test/org/apache/solr/client/solrj/cloud/autoscaling/TestPolicy.java#L329-L338]
 being loaded. Then there is a violation of there being more than 1 replica for 
a shard on the same node _(in this case, node1)_. This is why the first stage 
is moving replica and verifying it has moved to {{node2}}.

What doesn't make sense to me is why it's then moving a replica to node3, 
because this then re-raises a violation of there being more than 1 replica of 
the same shard on the same node. So I don't get why it's doing it again, nor 
how it passes.... If you spot something that I can't see then let me know.

After that the policies change to allow more than 1, but less than 3 replicas 
of the same shard to be on the same node. This is where the test is failing, 
because with the new comparator, it will order replicas based on their name, 
and with the first iteration of moving replicas, it expects replica 3 
_({{r3}})_ to be moved first, however, it would in fact be replica 1 
_({{r1}})_. 

So again it would be just moving the stages around. I'm not sure why doing this 
changes which node it goes too, that is throwing me off a little bit.

I've attached a patch with updates to these tests, ran the {{ant test 
-Dtestcase=TestPolicy}} and it passed, let me know what you think.

 [^SOLR-13240.patch] 


was (Author: goodman):
Okay having had a look at this, this is what I understand from it:

With the 
[clusterstate|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.4.0/solr/solrj/src/test/org/apache/solr/client/solrj/cloud/autoscaling/TestPolicy.java#L90-L130]
 that is being loaded, and the 
[policies|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.4.0/solr/solrj/src/test/org/apache/solr/client/solrj/cloud/autoscaling/TestPolicy.java#L329-L338]
 being loaded. Then there is a violation of there being more than 1 replica for 
a shard on the same node _(in this case, node1)_. This is why the first stage 
is moving replica and verifying it has moved to {{node2}}.

What doesn't make sense to me is why it's then moving a replica to node3, 
because this then re-raises a violation of there being more than 1 replica of 
the same shard on the same node. So I don't get why it's doing it again, nor 
how it passes.... If you spot something that I can't see then let me know.

After that the policies change to allow more than 1, but less than 3 replicas 
of the same shard to be on the same node. This is where the test is failing, 
because with the new comparator, it will order replicas based on their name, 
and with the first iteration of moving replicas, it expects replica 3 
_({{r3}})_ to be moved first, however, it would in fact be replica 1 
_({{r1}})_. 

So again it would be just moving the stages around. I'm not sure why doing this 
changes which node it goes too, that is throwing me off a little bit.

I've attached a patch with updates to these tests, ran the {{ant test 
-Dtestcase=TestPolicy}} and it passed, let me know what you think.



> UTILIZENODE action results in an exception
> ------------------------------------------
>
>                 Key: SOLR-13240
>                 URL: https://issues.apache.org/jira/browse/SOLR-13240
>             Project: Solr
>          Issue Type: Bug
>          Components: AutoScaling
>    Affects Versions: 7.6
>            Reporter: Hendrik Haddorp
>            Priority: Major
>         Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, SOLR-13240.patch, solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
>     "status":500,
>     "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
>     "msg":"Comparison method violates its general contract!",
>     "rspCode":-1},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"Comparison method violates its general contract!",
>     "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>  
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n",
>     "code":500}} 
> The logs show this additional exception:
> 2019-02-10 00:09:00.539 ERROR 
> (OverseerThreadFactory-1268-thread-38-processing-n:agent2:9151_solr) [   ] 
> o.a.s.c.a.c.OverseerCollectionMessageHandler Operation utilizenode 
> failed:java.lang.IllegalArgumentException: Comparison method violates its 
> general contract!
>     at java.util.TimSort.mergeLo(TimSort.java:777)
>     at java.util.TimSort.mergeAt(TimSort.java:514)
>     at java.util.TimSort.mergeCollapse(TimSort.java:439)
>     at java.util.TimSort.sort(TimSort.java:245)
>     at java.util.Arrays.sort(Arrays.java:1512)
>     at java.util.ArrayList.sort(ArrayList.java:1462)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.tryEachNode(MoveReplicaSuggester.java:50)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.init(MoveReplicaSuggester.java:38)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.Suggester.getSuggestion(Suggester.java:187)
>     at 
> org.apache.solr.cloud.api.collections.UtilizeNodeCmd.call(UtilizeNodeCmd.java:100)
>     at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:259)
>     at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:478)
>     at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) 
> I suspect this to be caused by this comparator: 
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/MoveReplicaSuggester.java#L98
> In case it is possible that both compared replicas are leaders the result 
> would not be correct.
> see also the mail thread about this:
> https://www.mail-archive.com/[email protected]&q=subject:%22Re%5C%3A+CloudSolrClient+getDocCollection%22&o=newest&f=1



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to