Thanks for raising the questions, I will come back later in more detail. Just a quick note, the idea about "shards scale write" and "replica scale read" is correct, but Elasticsearch is also "elastic" which means it "scales out", by adding node hardware. The shard/replica scale pattern finds its limits in a node hardware, because shards/replica are tied to machines, and there are the hard resource constraints, mostly disk I/O and memory related.
In the end, you can take as a rule of thumb: - add replica to scale "read" load - add new indices (i.e. new shards) to scale "write" load - and add nodes to scale out the whole cluster for both read and write load More later, Jörg On Thu, Jun 5, 2014 at 7:17 PM, Todd Nine <[email protected]> wrote: > Hey Jörg, > Thank you for your response. A few questions/points. > > In our use cases, the inability to write or read is considered a downtime. > Therefore, I cannot disable writes during expansion. Your alias points > raise > some interesting research I need to do, and I have a few follow up > questions. > > > > Our systems are fully multi tenant. We currently intend to have 1 index > per application. Each application can have a large number of types within > their index. Each cluster could potentially hold 1000 or more > applications/indexes. Most users will never need more than 5 shards. > Some users are huge power users, billions of documents for a type with > several large types. These are the users I am concerned with. > > From my understanding and experimentation, Elastic Search has 2 primary > mechanisms for tuning performance to handle the load. First is the shard > count. The higher the shard count, the more writes you can accept. Each > shard has a master which accepts the write, and replicates the write to > it's replicas. For high write throughput, you increase the count of shards > to distribute the load across more nodes. For read throughput, you > increase the replica count. This gives you higher performance on read, > since you now have more than 1 node per shard you can query to get results. > > > > Per your suggestion, rather than than copy the documents from an index > with 5 shards to an index with 10 shards, I can theoretically create a new > index then add it the alias. For instance, I envision this in the following > way. > > > Aliases: > app1-read > app1-write > > > Initial creation: > > app1-read -> Index: App1-index1 (5 shards, 2 replicas) > app1-write -> Index: App1-index1 > > User begins to have too much data in App1-index1. The 5 shards are > causing hotspots. The following actions take place. > > 1) Create App1-index2 (5 shards, 2 replicas) > > 2) Update app1-read -> App1-index1 and App1-index2 > > 3) Update app1-read -> App1-index1 and App1-index2 > > > I have some uncertainty around this I could use help with. > > Once the new index has been added, how are requests routed? For instance, > if I have a document "doc1" in the App1-index1, and I delete it after > adding the new index, is the alias smart enough to update App1-index1, or > will it broadcast the operation to both indexes? In other words, if I > create an alias with 2 or more indexes, will the alias perform routing, or > is it a broadcast to all indexes. How does this scale in practice? > > > In my current understanding, using an alias on read is simply going to be > the same as if you doubled shards. If you have 10 shard (replication 2) > index, this is functionally equivalent to aliases that aggregate 2 indexes > with 5 shards and 2 replicas each. All 10 shards (or one of their > replicas) would need to be read to aggregate the results, correct? > > > > Lastly, we have a Multi Region requirement. We are by eventually > consistent by design between regions. We want documents written in one > region to be replicated to another for query. All documents are immutable. > This is by design, so we don't get document version collisions between > data centers. What are some current mechanisms in use in production > environments to replicate indexes across regions? I just can't seem to > find any. Rivers was my initial thinking so regions can pull data from > other regions. However if this isn't a good fit, what is? > > Thanks guys! > Todd > > > > > > On Thu, Jun 5, 2014 at 9:21 AM, [email protected] < > [email protected]> wrote: > >> The knapsack plugin does not come with a downtime. You can increase >> shards on the fly by copying an index over to another index (even on >> another cluster). The index should be write disabled during copy though. >> >> Increasing replica level is a very simple command, no index copy required. >> >> It seems you have a slight misconception about controlling replica >> shards. You can not start dedicated copy actions only from the replica. (By >> setting _preference for search, this works for queries). >> >> Maybe I do not understand your question, but what do you mean by "dual >> writes"? And why would you "move" an index? >> >> Please check the index aliases. The concept of index aliases allow >> redirecting index names in the API by a simple atomic command. >> >> It will be tough to monitor an outgrowing index since there is no clear >> indication of the type "this cluster capacity is full because the index is >> too large or overloaded, please add your nodes now". In real life, heaps >> will fill up here and there, latency will increase, all of a sudden queries >> or indexing will congest now and then. If you encounter this, you have no >> time to copy an old index to a new one - the copy process also takes >> resources, and the cluster may not have enough. You must begin to add nodes >> way before capacity limit is reached. >> >> Instead of copying an index, which is a burden, you should consider >> managing a bunch of indices. If an old index is too small, just start a new >> one which is bigger and has more shards and spans more nodes, and add them >> to the existing set of indices. With index alias you can combine many >> indices into one index name. This is very powerful. >> >> If you can not estimate the data growth rate, I recommend also to use a >> reasonable number of shards from the very start. Say, if you expect 50 >> servers to run an ES node on, then simply start with 50 shards on a small >> number of servers, and add servers over time. You won't have to bother >> about shard count for a very long time if you choose such a strategy. >> >> Do not think about rivers, they are not built for such use cases. Rivers >> are designed as a "play tool" for fetching data quickly from external >> sources, for demo purpose. They are discouraged for serious production use, >> they are not very reliable if they run unattended. >> >> Jörg >> >> >> On Thu, Jun 5, 2014 at 7:33 AM, Todd Nine <[email protected]> wrote: >> >>> >>> >>> >>>> 2) https://github.com/jprante/elasticsearch-knapsack might do what you >>>> want. >>>> >>> >>> This won't quite work for us. We can't have any down time, so it seems >>> like an A/B system is more appropriate. What we're currently thinking is >>> the following. >>> >>> Each index has 2 aliases, a read and a write alias. >>> >>> 1) Both read and write aliases point to an initial index. Say shard >>> count 5 replication 2 (ES is not our canonical data source, so we're ok >>> with reconstructing search data) >>> >>> 2) We detect via monitoring we're going to outgrow an index. We create a >>> new index with more shards, and potentially a higher replication depending >>> on read load. We then update the write alias to point to both the old and >>> new index. All clients will then being dual writes to both indexes. >>> >>> 3) While we're writing to old and new, some process (maybe a river?) >>> will begin copying documents updated < the write alias time from the old >>> index to the new index. Ideally, it would be nice if each replica could >>> copy only it's local documents into the new index. We'll want to throttle >>> this as well. Each node will need additional operational capacity >>> to accommodate the dual writes as well as accepting the write of the "old" >>> documents. I'm concerned if we push this through too fast, we could cause >>> interruptions of service. >>> >>> >>> 4) Once the copy is completed, the read index is moved to the new index, >>> then the old index is removed from the system. >>> >>> Could such a process be implemented as a plugin? If the work can happen >>> in parallel across all nodes containing a shard we can increase the >>> process's speed dramatically. If we have a single worker, like a river, it >>> might possibly take too long. >>> >>> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elasticsearch/4qO5BZSxWhc/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_hHe7inoZOwQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_hHe7inoZOwQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf_wMHGZcK%2Bxozm%3Dxz4zh5prKmuob%3DBbXKzrHoVgiOk1wA%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf_wMHGZcK%2Bxozm%3Dxz4zh5prKmuob%3DBbXKzrHoVgiOk1wA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEiKbTqhbYM2eeCCh5Y%2BD%3D1-isbzVPkK7gTeLa4X96spQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
