[ 
https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804419#comment-13804419
 ] 

Noble Paul commented on SOLR-5381:
----------------------------------

bq.isn't the most common query case going to query across all shards of a 
collection?

If you have 10,000s of shards any distributed search across all the shards will 
be too slow/expensive. The most common usecase in that scale would be a search 
that spans a single shard or a handful of shards . (It is not custom sharding , 
it is probably going to use the CompositeId router). If you are building a 
personalized website serving millions of users, this would be the common 
usecase . e.g: mail service , file storage service, geographically localized 
search etc.  

bq.Now, maybe you meant simply to say that collections would tend to be smaller
I don't wish to limit scaling to large no:of small collections or vice versa. 
That should be the choice of the user. But I can prioritize one kind of scaling 
over other. 10,000's of collections or 100,000s of shards would be the ultimate 
aim. We won't reach there in one step .it has to be iterative


bq.So, once again, let's have some clarity about how many collections

The point is , we didn't build SolrCloud with a specific number in mind. The 
objective was to scale as much as possible. The next logical step would be to 
scale a lot higher by eliminating the known bottlenecks one by one. 

> Split Clusterstate and scale 
> -----------------------------
>
>                 Key: SOLR-5381
>                 URL: https://issues.apache.org/jira/browse/SOLR-5381
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> clusterstate.json is a single point of contention for all components in 
> SolrCloud. It would be hard to scale SolrCloud beyond a few thousand nodes 
> because there are too many updates and too many nodes need to be notified of 
> the changes. As the no:of nodes go up the size of clusterstate.json keeps 
> going up and it will soon exceed the limit impossed by ZK.
> The first step is to store the shards information in separate nodes and each 
> node can just listen to the shard node it belongs to. We may also need to 
> split each collection into its own node and the clusterstate.json just 
> holding the names of the collections .
> This is an umbrella issue



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to