Re: 2 clusters versus 1 big cluster?

Mauri Thu, 20 Mar 2014 19:01:27 -0700

Hi Brad

I agree with what Mark and Zachary have said and will expand on these.

Firstly, shard and index level operations in ElasticSearch are
peer-to-peer. Single-shard operations will affect at most 2 nodes, the node
receiving the request and the node hosting an instance of the shard
(primary or replica depending on the operation). Multi-shard operations
(such a searches) will affect from one to (N +1) nodes where N is the
number of shards in the index.

So from an index/shard operation perspective there is no reason to split
into two clusters. The key issue with index/shard operations is that the
cluster is able to handle the traffic volume. So if you do decide to split
out into to two clusters you will need to look at the load profile for each
of your client types to determine how much raw processing power you need in
each cluster. It may be that a 10:20 split is more optimum than a 15:15
split between clusters to balance request traffic, and therefore CPU
utilisation, across all nodes. If you go with one cluster this is not an
issue because you can move shards between nodes to balance the request
traffic.

Larger clusters also imply more work for the cluster master in managing the
cluster. This comes down to the number of nodes that the master has to
communicate with, and manage, and the size of the cluster state. A cluster
with 30 nodes is not too large for a master to keep track of. There will be
an increase in network traffic associated with the increase in volume of
master-to-worker and worker-to-master pings used to detect the
presence/absence of nodes. This can be offset by reducing the respective
ping intervals.

In a large cluster it is good practice to have a group of dedicated master
nodes, say 3, from which the master is elected. These nodes do not host any
user data meaning that cluster management is not compromised by high user
request traffic.

The size of the cluster state may be more of an issue. The cluster state
comprises all of the information about the cluster configuration. The
cluster state has records for each node, index, document mapping, shard,
etc. Whenever there is a change to the cluster state it is first made by
the master which then sends the updated cluster state to each worker node.
Note that the entire cluster state is sent, not just the changes! It is
therefore highly desirable to limit that frequency of changes to the
cluster state, primarily by minimizing dynamic field mapping updates, and
the overall size of the cluster state, primarily by minimizing the number
of indices.

In your proposed model the size of the cluster state associated the set of
60 shared month indices will be larger than that of one set of 60 dedicated
month indices by virtue of having 100 shards to 6. However, it may not be
much bigger because there will be much more metadata associated with
defining the index structure, notably the field mappings for all document
types in the index, than the metadata defining the shards of the index. So
it may well be that the size of the cluster state associated with 60
"shared" month indices plus N sets of 60 "dedicated" indices is not much
more than that of (N + 1) sets of 60 "dedicated" indices. So there may not
be much point in splitting to two clusters. A quick way to look at this for
your actual data model is to:
1. Set up an index in ES with mappings for all document types and 6
shards and 0 replicas,
2. Retrieve the index metadata JSON using ES admin API,
3. Increase the number of replicas to 16 (102 shards total),
4. Retrieve the index metadata JSON using ES admin API,
5. Compare the two JSON documents from 2 and 4.

As state above it is desirable to minimize the number of indices. Each
shard is a Lucene index which consumes memory and requires open file
descriptors from the OS for segment data files and Lucene index level
files. You may find yourself running out of memory and/or file descriptors
if you are not careful.

I understand you are looking for a design that will cater for on disc data
volume. Given that your data is split into monthly indices it may well be
that no one index, either "shared" or "dedicated" will reach that volume in
one month. There may also be seasonal factors to consider whereby one or
two months have much higher volumes than others. I have read/heard about
cases where a monthly index architecture was implemented but later scraped
for a single index approach because the month-to-month variation in volume
was detrimental to overall system resource utilisation and performance.

In you case think about whether monthly indices are really appropriate. An
alternative model is to partition one years worth of data into a set of
indices bounded by size rather than time. In this model a new index is
started on Jan 01 and data is added to it until it reaches some predefined
size limit, at which point a new index is created to accept new data from
that point on. This is repeated until year end at which point you might end
up with data for Jan/Feb/Mar and half of Apr in index 2014-01, the rest of
Apr plus May - Oct in index 2014-02 and Nov/Dec in index 2014-03. This way
you end up using the smallest number of indices within the constraints of
manageable shard size and overall user data volume. This may also be a
better approach than indices with 100 shards. This does, however, come at
the cost of more complexity when it comes to accessing data across multiple
indices, but this is a one off development cost rather than an ongoing
maintenance cost.

Also, rather than focusing so much on shard size look at the number and
size of the segment files comprising a shard. Remember that Lucene segments
are essentially immutable (except for marking deletes) after they are
written to disc. This means that some index management operations may
reduce down to simply copying/moving one or two segment files across the
network, rather than all segment files for a given shard. For example, when
allocating shards to nodes ElasticSearch tries to allocate shards to nodes
that already have some of that shard's segment data. The new incremental
backup/restore in ES V1 also takes advantage of this segment immutability.
With this in mind you might be able to support shards > 5G consisting of
more segment files of bounded size rather than fewer segment files of
unbounded size (and ultimately a single 5G segment file).

Hope this helps,
Regards
Mauri

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5072d2c9-a418-4afc-82e6-d2b8926d82c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 2 clusters versus 1 big cluster?

Reply via email to