[ https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804419#comment-13804419 ]
Noble Paul commented on SOLR-5381: ---------------------------------- bq.isn't the most common query case going to query across all shards of a collection? If you have 10,000s of shards any distributed search across all the shards will be too slow/expensive. The most common usecase in that scale would be a search that spans a single shard or a handful of shards . (It is not custom sharding , it is probably going to use the CompositeId router). If you are building a personalized website serving millions of users, this would be the common usecase . e.g: mail service , file storage service, geographically localized search etc. bq.Now, maybe you meant simply to say that collections would tend to be smaller I don't wish to limit scaling to large no:of small collections or vice versa. That should be the choice of the user. But I can prioritize one kind of scaling over other. 10,000's of collections or 100,000s of shards would be the ultimate aim. We won't reach there in one step .it has to be iterative bq.So, once again, let's have some clarity about how many collections The point is , we didn't build SolrCloud with a specific number in mind. The objective was to scale as much as possible. The next logical step would be to scale a lot higher by eliminating the known bottlenecks one by one. > Split Clusterstate and scale > ----------------------------- > > Key: SOLR-5381 > URL: https://issues.apache.org/jira/browse/SOLR-5381 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Reporter: Noble Paul > Assignee: Noble Paul > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > clusterstate.json is a single point of contention for all components in > SolrCloud. It would be hard to scale SolrCloud beyond a few thousand nodes > because there are too many updates and too many nodes need to be notified of > the changes. As the no:of nodes go up the size of clusterstate.json keeps > going up and it will soon exceed the limit impossed by ZK. > The first step is to store the shards information in separate nodes and each > node can just listen to the shard node it belongs to. We may also need to > split each collection into its own node and the clusterstate.json just > holding the names of the collections . > This is an umbrella issue -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org