[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397695#comment-13397695
 ] 

Mark Miller commented on SOLR-3488:
-----------------------------------

Perhaps its a little too ambitious, but the reason I brought up the idea of the 
overseer handling collection management every n seconds is:

Lets say you have 4 nodes with 2 collections on them. You want each collection 
to use as many nodes as are available. Now you want to add a new node. To get 
it to participate in the existing collections, you have to configure them, or 
create new compatible cores over http on the new node. Wouldn't it be nice if 
the Overseer just saw the new node, that the collections had repFactor=MAX_INT 
and created the cores for you?

Also, consider failure scenarios:

If you remove a collection, what happens when a node that was down comes back 
and had that a piece of that collection? Your collection will be back as a 
single node. An Overseer process could prune this off shortly after.

So numShards/repFactor + Overseeer smarts seems simple and good to me. But 
sometimes you may want to be precise in picking shards/repliacs. Perhaps simply 
doing some kind of 'rack awareness' type feature down the road is the best way 
to control this though. You could create connections and weight costs using 
token markers for each node or something.

So I think maybe we would need a new zk node where solr instances register 
rather than cores? then we know what is available to place replicas on - even 
if that Solr instance has no cores?

Then the Overseer would have a process that ran every n (1 min?) and looked at 
each collection and its repFactor and numShards, and add or prune given the 
current state.

This would also account for failures on collection creation or deletion. If a 
node was down and missed the operation, when it came back, within N seconds, 
the Overseer would add or prune with the restored node.

It handles a lot of failures scenarios (with some lag) and makes the interface 
to the user a lot simpler. Adding nodes can eventually mean just starting up a 
node new rather than requiring any config. It's also easy to deal with changing 
the replication factor. Just update it in zk, and when the Overseer process 
runs next, it will add and prune to match the latest value (given the number of 
nodes available).



                
> Create a Collections API for SolrCloud
> --------------------------------------
>
>                 Key: SOLR-3488
>                 URL: https://issues.apache.org/jira/browse/SOLR-3488
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>         Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
> SOLR-3488_2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to