Re: how to scale an ES deployment to millions of tenants with different data schemas

Itamar Syn-Hershko Wed, 17 Sep 2014 06:38:57 -0700

This will still mean less overhead than having those distinct field in
discreet indexes. I wouldn't worry about that.


--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Author of RavenDB in Action <http://manning.com/synhershko/>

On Wed, Sep 17, 2014 at 4:35 PM, Ziv Shalev <[email protected]> wrote:

> thanks for the prompt reply!
> one thing though - when using a single multi-tenant index, my concerns are
> not around the number of fields per doc (which is small, less than 50),
> but rather the fact that since each tenant has different fields, the
> accumulated number of fields in such an index will be huge.
>
> i.e. tenant 1 has fields F11..F1n, tenant 2 has fields F21..F2n, ...
> these fields are distinct so the number of fields for the multi-tenant
> index will grow to millions quickly.
>
> will such an indexing methodology work in ES?
>
> thanks!
>
>
> On Wednesday, September 17, 2014 4:21:17 PM UTC+3, Itamar Syn-Hershko
> wrote:
>>
>> First, you should really read this: http://aphyr.com/posts/
>> 317-call-me-maybe-elasticsearch regarding using ES as a single source of
>> truth
>>
>> Millions of indexes is not advisable, unless you plan on having millions
>> of servers. Depending on index size and write frequency to them, you don't
>> want to have more than a few dozen indexes per machine (including
>> replicas). This is because of concerns of memory, CPU, I/O and file
>> descriptors.
>>
>> One big single index may present its own problems due to the different
>> schemas, although it may be solvable using dynamic index templates
>> <http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html#dynamic-templates>.
>> I will still expect you to have issues with number of shards (basically,
>> running out of shards at some point).
>>
>> Therefore I will try and find a middle way here, using probably some sort
>> of a mapping mechanism. Even also time based if its applicable.
>>
>> Re your questions:
>>
>> * are there production deployments out there that have a million active
>> indexes? what do they look like?
>>
>> I'm not aware of such
>>
>> * how many different fields does it make sense to host in a single index?
>> would it scale to millions of fields in a single index?
>>
>> You mean in a single document. I recall seeing Shay suggesting not to go
>> over the 100 threshold or so. Lucene really isn't optimized for scaling
>> vertically, especially in the document level.
>>
>> * are there other ways to go about this that we have overlooked?
>>
>> Maybe look at your data model and try to re-arrange it.
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/23a8484a-dcfc-4c8a-bc9d-a02bc4280985%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/23a8484a-dcfc-4c8a-bc9d-a02bc4280985%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtjRiUhQPNL2i3TCr9ZNus%3DijMAZJ2J3P25vtHnxNGUag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to scale an ES deployment to millions of tenants with different data schemas

Reply via email to