Re: Upper bounds on the number of indexes in an elastic search cluster

[email protected] Fri, 26 Sep 2014 16:45:07 -0700

If you consider tens of thousands of indices on tens of thousands of nodes,
and the master node is the only node that can write to the cluster state,
it will have lot of work to do to keep up with all cluster state updates.


When the rate of changes to the cluster state increases, the master node
will be challenged to propagate the state changes to all other nodes
reliably and fast enough. There is no "distributed tree" yet, for example,
there are no special "forwarder nodes" that communicate the cluster state
to a partitioned set of nodes.

See https://github.com/elasticsearch/elasticsearch/issues/6186

The cluster state is a big compressed JSON structure which also must fit
into the heap memory of master eligible nodes. ES also uses privileged
network traffic channels for cluster state communication to take precedence
over ordinary index/search messaging. But all these precautions may not be
enough at some point. You can observe the point by retrieving a growing
cluster state over the cluster state API and measure the size and time of
this request.

On the other hand, you can have calm cluster state and many thousand
indices when the type mappings are constant, no field updates occur, and no
nodes connect/disconnect. It all depends on the situation how you have to
operate the cluster. One important thing is to allocate enough resources to
master eligible data-less nodes so they are not hindered by extra
search/index load.

N.B. 20 nodes is not a big cluster. There are ES clusters of hundreds and
 thousands of nodes. From my understanding, the communication of the master
and 20 nodes is not a serious problem. This becomes an issue at ~500-1000
nodes.

Jörg




On Sat, Sep 27, 2014 at 1:12 AM, Todd Nine <[email protected]> wrote:

> Hi Jorg,
>   We're storing each application in it's own Index so we can manage it
> independently of others.  There's not set load or usage on our
> applications.  Some will be very small, a few hundred documents.  Others
> will be quite large, in the billions.  We have no way of knowing what the
> usage profile is.  Rather, we're initially thinking that expansion will
> occur using a combination of additional indexes and aliases referencing
> those indexes.  This allows us to automate the management of the aliases
> and indexes, and in turn allows us to scale them to the needs of the
> application without over allocating unused capacity.   For instance, with
> write heavy applications we can allocate more shards (via alias), for read
> heavy application we can allocate more replicas.
>
>
>
> We're not running our cluster on a single node.  Our cluster is small to
> begin with, it's 6 nodes in our current POC.  Ultimately I expect us to
> grow each cluster we stand up to 20 or so nodes.  We'll expand as necessary
> to support the number of shards and replicas and keep our performance up.
> I'm not particularly worried about our ability to scale horizontally with
> our hardware.
>
> Rather, I'm concerned with how far can we scale on our number of indexes,
> and how does that relate to the number of machines?  When we keep adding
> hardware, does this increase the upper bounds of the number of indexes we
> can have?  Not the physical shards and replicas, but the routing
> information for the master of the shards and location of the replicas.
> I've done distributed data storage for many years, and none of the
> documentation on ES makes it clear if this becomes an issue operationally.
> I'm leery to just assume it will "just work".  When implementing something
> like this, you either have to do a distributed tree for your meta data to
> get the partitioning you need to scale infinitely, or every node must store
> every shard's master information.   How does it work in ES?
>
>
> Thanks,
> Todd
>
>
>
> On Friday, September 26, 2014 4:29:53 PM UTC-6, Jörg Prante wrote:
>>
>> Why do you want to create huge number of indexes on just a single node?
>>
>> There are smarter methods to scale. Use over-allocation of shards. This
>> is explained by kimchy in this thread
>>
>> http://elasticsearch-users.115913.n3.nabble.com/Over-
>> allocation-of-shards-td3673978.html
>> <http://www.google.com/url?q=http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2FOver-allocation-of-shards-td3673978.html&sa=D&sntz=1&usg=AFQjCNEk7KTtpuEot3JtBBmMRMpH25vLDA>
>>
>> TL;DR you can create many thousands of aliases on a single (or few)
>> indices with just a few shards. There is no limit defined by ES, when your
>> configuration / hardware capacity is exceeded, you will see the node
>> getting sluggish.
>>
>> Jörg
>>
>> On Fri, Sep 26, 2014 at 11:23 PM, Todd Nine <[email protected]> wrote:
>>
>>> Hey guys.  We’re building a Multi tenant application, where users create
>>> applications within our single server.    For our current ES scheme, we're
>>> building an index per application.  Are there any stress tests or
>>> documentation on the upper bounds of the number of indexes a cluster can
>>> handle?  From my current understanding of meta data and routing, ever node
>>> caches the meta data of all the indexes and shards for routing.  At some
>>> point, this will obviously overwhelm the node.  Is my current understanding
>>> correct, or is this information partitioned across the cluster as well?
>>>
>>>
>>> Thanks,
>>>
>>> Todd
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/200e1ce7-c56f-49d4-9c02-4b1dcc570bf2%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/200e1ce7-c56f-49d4-9c02-4b1dcc570bf2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/595db4ab-b2f3-4dfe-bbf0-e4c13926e75e%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/595db4ab-b2f3-4dfe-bbf0-e4c13926e75e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGULM1L_6s%3Dk7SAftA%3D0TM3Jx1haYddnAOLAuHa-4z%2BOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Upper bounds on the number of indexes in an elastic search cluster

Reply via email to