Re: how to scale an ES deployment to millions of tenants with different data schemas

Ziv Shalev Wed, 17 Sep 2014 06:36:06 -0700

thanks for the prompt reply!
one thing though - when using a single multi-tenant index, my concerns are 
not around the number of fields per doc (which is small, less than 50),
but rather the fact that since each tenant has different fields, the 
accumulated number of fields in such an index will be huge.


i.e. tenant 1 has fields F11..F1n, tenant 2 has fields F21..F2n, ...
these fields are distinct so the number of fields for the multi-tenant 
index will grow to millions quickly.

will such an indexing methodology work in ES?

thanks!

On Wednesday, September 17, 2014 4:21:17 PM UTC+3, Itamar Syn-Hershko wrote:
>
> First, you should really read this: 
> http://aphyr.com/posts/317-call-me-maybe-elasticsearch regarding using ES 
> as a single source of truth
>
> Millions of indexes is not advisable, unless you plan on having millions 
> of servers. Depending on index size and write frequency to them, you don't 
> want to have more than a few dozen indexes per machine (including 
> replicas). This is because of concerns of memory, CPU, I/O and file 
> descriptors.
>
> One big single index may present its own problems due to the different 
> schemas, although it may be solvable using dynamic index templates 
> <http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html#dynamic-templates>.
>  
> I will still expect you to have issues with number of shards (basically, 
> running out of shards at some point).
>
> Therefore I will try and find a middle way here, using probably some sort 
> of a mapping mechanism. Even also time based if its applicable.
>
> Re your questions:
>
> * are there production deployments out there that have a million active 
> indexes? what do they look like?
>
> I'm not aware of such
>
> * how many different fields does it make sense to host in a single index? 
> would it scale to millions of fields in a single index?
>
> You mean in a single document. I recall seeing Shay suggesting not to go 
> over the 100 threshold or so. Lucene really isn't optimized for scaling 
> vertically, especially in the document level.
>
> * are there other ways to go about this that we have overlooked?
>
> Maybe look at your data model and try to re-arrange it.
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/23a8484a-dcfc-4c8a-bc9d-a02bc4280985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to scale an ES deployment to millions of tenants with different data schemas

Reply via email to