The knapsack plugin does not come with a downtime. You can increase shards on the fly by copying an index over to another index (even on another cluster). The index should be write disabled during copy though.
Increasing replica level is a very simple command, no index copy required. It seems you have a slight misconception about controlling replica shards. You can not start dedicated copy actions only from the replica. (By setting _preference for search, this works for queries). Maybe I do not understand your question, but what do you mean by "dual writes"? And why would you "move" an index? Please check the index aliases. The concept of index aliases allow redirecting index names in the API by a simple atomic command. It will be tough to monitor an outgrowing index since there is no clear indication of the type "this cluster capacity is full because the index is too large or overloaded, please add your nodes now". In real life, heaps will fill up here and there, latency will increase, all of a sudden queries or indexing will congest now and then. If you encounter this, you have no time to copy an old index to a new one - the copy process also takes resources, and the cluster may not have enough. You must begin to add nodes way before capacity limit is reached. Instead of copying an index, which is a burden, you should consider managing a bunch of indices. If an old index is too small, just start a new one which is bigger and has more shards and spans more nodes, and add them to the existing set of indices. With index alias you can combine many indices into one index name. This is very powerful. If you can not estimate the data growth rate, I recommend also to use a reasonable number of shards from the very start. Say, if you expect 50 servers to run an ES node on, then simply start with 50 shards on a small number of servers, and add servers over time. You won't have to bother about shard count for a very long time if you choose such a strategy. Do not think about rivers, they are not built for such use cases. Rivers are designed as a "play tool" for fetching data quickly from external sources, for demo purpose. They are discouraged for serious production use, they are not very reliable if they run unattended. Jörg On Thu, Jun 5, 2014 at 7:33 AM, Todd Nine <[email protected]> wrote: > > > >> 2) https://github.com/jprante/elasticsearch-knapsack might do what you >> want. >> > > This won't quite work for us. We can't have any down time, so it seems > like an A/B system is more appropriate. What we're currently thinking is > the following. > > Each index has 2 aliases, a read and a write alias. > > 1) Both read and write aliases point to an initial index. Say shard count > 5 replication 2 (ES is not our canonical data source, so we're ok with > reconstructing search data) > > 2) We detect via monitoring we're going to outgrow an index. We create a > new index with more shards, and potentially a higher replication depending > on read load. We then update the write alias to point to both the old and > new index. All clients will then being dual writes to both indexes. > > 3) While we're writing to old and new, some process (maybe a river?) will > begin copying documents updated < the write alias time from the old index > to the new index. Ideally, it would be nice if each replica could copy > only it's local documents into the new index. We'll want to throttle this > as well. Each node will need additional operational capacity > to accommodate the dual writes as well as accepting the write of the "old" > documents. I'm concerned if we push this through too fast, we could cause > interruptions of service. > > > 4) Once the copy is completed, the read index is moved to the new index, > then the old index is removed from the system. > > Could such a process be implemented as a plugin? If the work can happen > in parallel across all nodes containing a shard we can increase the > process's speed dramatically. If we have a single worker, like a river, it > might possibly take too long. > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_hHe7inoZOwQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
