[ 
https://issues.apache.org/jira/browse/SOLR-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6517:
---------------------------------
    Attachment: SOLR-6517.patch

Reviewboard here: https://reviews.apache.org/r/26632/


Here's a patch for people to poke holes in. It is NOT ready to commit. I 
started out with a really simple throttling mechanism and then went to a more 
sophisiticated one. I wanted folks to have an opportunity to critique both 
approaches, so they're both in this patch. Of course I'll pull one out before 
committing.

The meat of the differences are in collectionsHandler.handleReassignLeadersA 
and collectionsHandler.handleReassignLeadersB. Of the two, the B variant is my 
favorite by far. I hope to commit this late next week...

In one approach (the original crude one, see 
collectionsHandler.handleReassignLeadersA), the parameter "maxToReassign" just 
queues up the indicated number of leader reassignments and returns when they 
are done. maxToReassign defaults to Integer.MAX_VALUE. The process here would 
be to keep reassigning, say, 5 leaders until the collection was balanced. But 
the onus is on the consumer to figure out when enough were done.

The other mode, collectionsHandler.handleReassignLeadersB also takes 
"maxToReassign", but in this flavor it's the number of outstanding 
reassignments to allow at once; defaults to Integer.MAX_VALUE. When the limit 
is reached, the process waits until at least one of them completes, then queues 
up enough to get back to that max. QUESTION: Is there a better way to find out 
when an async process is done besides the poll/sleep loop in 
collectionsHandler.waitForLeaderChange?

Additionally in this mode,  maxToReassignWait is the number of seconds to wait 
for reassignment to complete before giving up. It's a bail-out so the call 
isn't stuck forever. Default value is 30 seconds. It's a little loose in that 
even if it returns, the process may still be going on and _eventually_ complete 
even if it bails out.

I should emphasize that only _one_ of the methods will make it to the final 
patch, almost certainly the second one unless there are howls.

There's quite a bit of information returned in the result set, which is another 
advantage of the second method. There's an example below, although it lacks the 
"failures" node because there weren't any...:

I should also emphasize that I'm sure stuff will pop out when I look at it 
fresh tomorrow, but the current form is enough to have people look at and poke 
holes in.

Erick

Sample response (note, I'll get rid of the "reassignleaders_" prefix).

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">523</int>
  </lst>
  <lst name="successes">
    <lst name="reassignleaders_eoe_shard1_replica3">
      <str name="status">success</str>
      <str name="msg">
        Assigned 'Collection: 'eoe', Shard: 'shard1', Core: 
'eoe_shard1_replica3', BaseUrl:
        'http://192.168.1.201:7600/solr'' to be leader
      </str>
    </lst>
    <lst name="reassignleaders_eoe_shard2_replica4">
      <str name="status">success</str>
      <str name="msg">
        Assigned 'Collection: 'eoe', Shard: 'shard2', Core: 
'eoe_shard2_replica4', BaseUrl:
        'http://192.168.1.201:7300/solr'' to be leader
      </str>
    </lst>
    <lst name="reassignleaders_eoe_shard3_replica4">
      <str name="status">success</str>
      <str name="msg">
        Assigned 'Collection: 'eoe', Shard: 'shard3', Core: 
'eoe_shard3_replica4', BaseUrl:
        'http://192.168.1.201:7400/solr'' to be leader
      </str>
    </lst>
    <lst name="reassignleaders_eoe_shard4_replica4">
      <str name="status">success</str>
      <str name="msg">
        Assigned 'Collection: 'eoe', Shard: 'shard4', Core: 
'eoe_shard4_replica4', BaseUrl:
        'http://192.168.1.201:8983/solr'' to be leader
      </str>
    </lst>
    <lst name="reassignleaders_eoe_shard6_replica3">
      <str name="status">success</str>
      <str name="msg">
        Assigned 'Collection: 'eoe', Shard: 'shard6', Core: 
'eoe_shard6_replica3', BaseUrl:
        'http://192.168.1.201:7500/solr'' to be leader
      </str>
    </lst>
  </lst>
  <lst name="alreadyLeaders">
    <lst name="core_node21">
      <str name="status">success</str>
      <str name="msg">Already leader</str>
      <str name="nodeName">192.168.1.201:7200_solr</str>
    </lst>
  </lst>
</response>

> CollectionsAPI call ELECTPREFERREDLEADERS
> -----------------------------------------
>
>                 Key: SOLR-6517
>                 URL: https://issues.apache.org/jira/browse/SOLR-6517
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 5.0, Trunk
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>         Attachments: SOLR-6517.patch
>
>
> Perhaps the final piece of SOLR-6491. Once the preferred leadership roles are 
> assigned, there has to be a command "make it so Mr. Solr". This is something 
> of a placeholder to collect ideas. One wouldn't want to flood the system with 
> hundreds of re-assignments at once. Should this be synchronous or asnych? 
> Should it make the best attempt but not worry about perfection? Should it???
> a collection=name parameter would be required and it would re-elect all the 
> leaders that were on the 'wrong' node
> I'm thinking an optionally allowing one to specify a shard in the case where 
> you wanted to make a very specific change. Note that there's no need to 
> specify a particular replica, since there should be only a single 
> preferredLeader per slice.
> This command would do nothing to any slice that did not have a replica with a 
> preferredLeader role. Likewise it would do nothing if the slice in question 
> already had the leader role assigned to the node with the preferredLeader 
> role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to