There are practical limits, based on your dataset, node sizing, version etc.
You'd be better off segregating indices by a higher level definition (eg customer number, 1-999, 1000-1999 etc), using routing and then aliases on top. This way you conceptually get the same layout as a single index per customer, but it gives you the option to split larger customers out to their own index and without wasting resources on small use customers. On 16 March 2015 at 19:11, Richard Blaylock <[email protected]> wrote: > Hi all, > > We have a multi-tenant product and are leaning towards dynamically > creating (and deleting) various indexes relevant to a tenant at runtime: as > a tenant is created, so are that tenant's indexes. When a tenant is > deleted so are that tenant's indexes. Each index is specific to that > tenant and could vary in size, but we do not expect any given index to ever > be larger than a single disk (e.g. 80 GB). > > Due to index shard issues (static, too many shards per index = a hit on > performance (more map/reduce work to do), etc.), and due to the nature of > our application, we are currently opting for a single-shard-per-index model > - each index will have one and only one shard. We will have replicas for > fault tolerance. > > On the surface, this appears to be an ideal design choice for multi-tenant > applications: for any given index, one and only one shard will be 'hit' - > no need to search across multiple shards, ever. It also reduces contention > because indexes are always tenant-specific: as an index becomes larger, any > slowness due to the large index *only* impacts the corresponding tenant > (customer), whereas the alternative - using one index across tenants - one > tenant's use/load could negatively impact other tenants' query performance. > > So for multi-tenancy, this single-shard-per-index model sounds ideal for > our use case - the *only* issue here is that the number of indexes > increases dramatically as the number of tenants (customers) increases. > Consider a system with 20,000 tenants, each having (potentially) hundreds > or thousands, or even 10s of thousands of indexes, resulting in millions of > indexes overall. This is manageable from our product's perspective, but > what impact would this have on ElasticSearch, if any? > > Are there practical limits? IIUC, there is a Lucene index (file) per > shard, so if there are hundreds of thousands or millions of Lucene > indexes/files - other than disk space and file descriptor count per ES > node, are there any other limits? Does performance degrade as the number > of single-shard-indexes increases? Or is there no problem at all? > > Thanks, > Richard > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/607f62c1-5854-43e0-9d25-3f748aca44a4%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/607f62c1-5854-43e0-9d25-3f748aca44a4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_UA2XX8M8bDCCS%2Bx4p9Ta5-nk1vj45pLh9JDSePY0AGQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
