Amazing! Thank you @Mauri. That is very helpful. Our reasoning behind monthly indexes was simply that all of our queries have a date range and a tenantID. We route by tenantID in the "shared" monthly indexes. We are able to specify which indexes to query against and only pass those index names to the ES API. This limits both the shards and indexes that are queried when a single tenant asks for data within a date range.
Thanks, Brad On Thursday, March 20, 2014 8:01:14 PM UTC-6, Mauri wrote: > > Hi Brad > > I agree with what Mark and Zachary have said and will expand on these. > > Firstly, shard and index level operations in ElasticSearch are > peer-to-peer. Single-shard operations will affect at most 2 nodes, the node > receiving the request and the node hosting an instance of the shard > (primary or replica depending on the operation). Multi-shard operations > (such a searches) will affect from one to (N +1) nodes where N is the > number of shards in the index. > > So from an index/shard operation perspective there is no reason to split > into two clusters. The key issue with index/shard operations is that the > cluster is able to handle the traffic volume. So if you do decide to split > out into to two clusters you will need to look at the load profile for each > of your client types to determine how much raw processing power you need in > each cluster. It may be that a 10:20 split is more optimum than a 15:15 > split between clusters to balance request traffic, and therefore CPU > utilisation, across all nodes. If you go with one cluster this is not an > issue because you can move shards between nodes to balance the request > traffic. > > Larger clusters also imply more work for the cluster master in managing > the cluster. This comes down to the number of nodes that the master has to > communicate with, and manage, and the size of the cluster state. A cluster > with 30 nodes is not too large for a master to keep track of. There will be > an increase in network traffic associated with the increase in volume of > master-to-worker and worker-to-master pings used to detect the > presence/absence of nodes. This can be offset by reducing the respective > ping intervals. > > In a large cluster it is good practice to have a group of dedicated master > nodes, say 3, from which the master is elected. These nodes do not host any > user data meaning that cluster management is not compromised by high user > request traffic. > > The size of the cluster state may be more of an issue. The cluster state > comprises all of the information about the cluster configuration. The > cluster state has records for each node, index, document mapping, shard, > etc. Whenever there is a change to the cluster state it is first made by > the master which then sends the updated cluster state to each worker node. > Note that the entire cluster state is sent, not just the changes! It is > therefore highly desirable to limit that frequency of changes to the > cluster state, primarily by minimizing dynamic field mapping updates, and > the overall size of the cluster state, primarily by minimizing the number > of indices. > > In your proposed model the size of the cluster state associated the set of > 60 shared month indices will be larger than that of one set of 60 dedicated > month indices by virtue of having 100 shards to 6. However, it may not be > much bigger because there will be much more metadata associated with > defining the index structure, notably the field mappings for all document > types in the index, than the metadata defining the shards of the index. So > it may well be that the size of the cluster state associated with 60 > "shared" month indices plus N sets of 60 "dedicated" indices is not much > more than that of (N + 1) sets of 60 "dedicated" indices. So there may not > be much point in splitting to two clusters. A quick way to look at this for > your actual data model is to: > 1. Set up an index in ES with mappings for all document types and 6 > shards and 0 replicas, > 2. Retrieve the index metadata JSON using ES admin API, > 3. Increase the number of replicas to 16 (102 shards total), > 4. Retrieve the index metadata JSON using ES admin API, > 5. Compare the two JSON documents from 2 and 4. > > As state above it is desirable to minimize the number of indices. Each > shard is a Lucene index which consumes memory and requires open file > descriptors from the OS for segment data files and Lucene index level > files. You may find yourself running out of memory and/or file descriptors > if you are not careful. > > I understand you are looking for a design that will cater for on disc data > volume. Given that your data is split into monthly indices it may well be > that no one index, either "shared" or "dedicated" will reach that volume in > one month. There may also be seasonal factors to consider whereby one or > two months have much higher volumes than others. I have read/heard about > cases where a monthly index architecture was implemented but later scraped > for a single index approach because the month-to-month variation in volume > was detrimental to overall system resource utilisation and performance. > > In you case think about whether monthly indices are really appropriate. An > alternative model is to partition one years worth of data into a set of > indices bounded by size rather than time. In this model a new index is > started on Jan 01 and data is added to it until it reaches some predefined > size limit, at which point a new index is created to accept new data from > that point on. This is repeated until year end at which point you might end > up with data for Jan/Feb/Mar and half of Apr in index 2014-01, the rest of > Apr plus May - Oct in index 2014-02 and Nov/Dec in index 2014-03. This way > you end up using the smallest number of indices within the constraints of > manageable shard size and overall user data volume. This may also be a > better approach than indices with 100 shards. This does, however, come at > the cost of more complexity when it comes to accessing data across multiple > indices, but this is a one off development cost rather than an ongoing > maintenance cost. > > Also, rather than focusing so much on shard size look at the number and > size of the segment files comprising a shard. Remember that Lucene segments > are essentially immutable (except for marking deletes) after they are > written to disc. This means that some index management operations may > reduce down to simply copying/moving one or two segment files across the > network, rather than all segment files for a given shard. For example, when > allocating shards to nodes ElasticSearch tries to allocate shards to nodes > that already have some of that shard's segment data. The new incremental > backup/restore in ES V1 also takes advantage of this segment immutability. > With this in mind you might be able to support shards > 5G consisting of > more segment files of bounded size rather than fewer segment files of > unbounded size (and ultimately a single 5G segment file). > > Hope this helps, > Regards > Mauri > > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7dc08cb9-2fb3-4046-a977-db4f15019414%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
