[ 
https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804392#comment-13804392
 ] 

Jack Krupansky commented on SOLR-5381:
--------------------------------------

bq. You are missing the point that it's very unlikely for anyone to query 
across all shards in a VERY LARGE cluster.

This gets back to my appeal for clarity on use cases. I mean, by definition, 
isn't the most common query case going to query across all shards of a 
collection? Sure, I suppose you could have an application with custom sharding 
such that the app always knows what shard will have the desired query results, 
such as a multitenant app which shards based on the userid field, but... isn't 
that a special case rather than a common case?

Now, maybe you meant simply to say that collections would tend to be smaller, 
but... you didn't explicitly say that.

So, once again, let's have some clarity about how many collections, how many 
shards per collection, and how many replicas per shard would need to be handled 
for various use cases of a proposed "very large" cluster.


> Split Clusterstate and scale 
> -----------------------------
>
>                 Key: SOLR-5381
>                 URL: https://issues.apache.org/jira/browse/SOLR-5381
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> clusterstate.json is a single point of contention for all components in 
> SolrCloud. It would be hard to scale SolrCloud beyond a few thousand nodes 
> because there are too many updates and too many nodes need to be notified of 
> the changes. As the no:of nodes go up the size of clusterstate.json keeps 
> going up and it will soon exceed the limit impossed by ZK.
> The first step is to store the shards information in separate nodes and each 
> node can just listen to the shard node it belongs to. We may also need to 
> split each collection into its own node and the clusterstate.json just 
> holding the names of the collections .
> This is an umbrella issue



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to