Re: Shard count and plugin questions

[email protected] Thu, 05 Jun 2014 09:22:06 -0700

The knapsack plugin does not come with a downtime. You can increase shards
on the fly by copying an index over to another index (even on another
cluster). The index should be write disabled during copy though.

Increasing replica level is a very simple command, no index copy required.

It seems you have a slight misconception about controlling replica shards.
You can not start dedicated copy actions only from the replica. (By setting
_preference for search, this works for queries).

Maybe I do not understand your question, but what do you mean by "dual
writes"? And why would you "move" an index?

Please check the index aliases. The concept of index aliases allow
redirecting index names in the API by a simple atomic command.

It will be tough to monitor an outgrowing index since there is no clear
indication of the type "this cluster capacity is full because the index is
too large or overloaded, please add your nodes now". In real life, heaps
will fill up here and there, latency will increase, all of a sudden queries
or indexing will congest now and then. If you encounter this, you have no
time to copy an old index to a new one - the copy process also takes
resources, and the cluster may not have enough. You must begin to add nodes
way before capacity limit is reached.

Instead of copying an index, which is a burden, you should consider
managing a bunch of indices. If an old index is too small, just start a new
one which is bigger and has more shards and spans more nodes, and add them
to the existing set of indices. With index alias you can combine many
indices into one index name. This is very powerful.

If you can not estimate the data growth rate, I recommend also to use a
reasonable number of shards from the very start. Say, if you expect 50
servers to run an ES node on, then simply start with 50 shards on a small
number of servers, and add servers over time. You won't have to bother
about shard count for a very long time if you choose such a strategy.

Do not think about rivers, they are not built for such use cases. Rivers
are designed as a "play tool" for fetching data quickly from external
sources, for demo purpose. They are discouraged for serious production use,
they are not very reliable if they run unattended.

Jörg

On Thu, Jun 5, 2014 at 7:33 AM, Todd Nine <[email protected]> wrote:

>
>
>
>> 2) https://github.com/jprante/elasticsearch-knapsack might do what you
>> want.
>>
>
> This won't quite work for us.  We can't have any down time, so it seems
> like an A/B system is more appropriate.  What we're currently thinking is
> the following.
>
> Each index has 2 aliases, a read and a write alias.
>
> 1) Both read and write aliases point to an initial index. Say shard count
> 5 replication 2 (ES is not our canonical data source, so we're ok with
> reconstructing search data)
>
> 2) We detect via monitoring we're going to outgrow an index. We create a
> new index with more shards, and potentially a higher replication depending
> on read load.  We then update the write alias to point to both the old and
> new index.  All clients will then being dual writes to both indexes.
>
> 3) While we're writing to old and new, some process (maybe a river?) will
> begin copying documents updated < the write alias time from the old index
> to the new index.  Ideally, it would be nice if each replica could copy
> only it's local documents into the new index.  We'll want to throttle this
> as well.  Each node will need additional operational capacity
> to accommodate the dual writes as well as accepting the write of the "old"
> documents.  I'm concerned if we push this through too fast, we could cause
> interruptions of service.
>
>
> 4) Once the copy is completed, the read index is moved to the new index,
> then the old index is removed from the system.
>
> Could such a process be implemented as a plugin?  If the work can happen
> in parallel across all nodes containing a shard we can increase the
> process's speed dramatically.  If we have a single worker, like a river, it
> might possibly take too long.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_hHe7inoZOwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard count and plugin questions

Reply via email to