Re: Shard count and plugin questions

Mark Walkom Tue, 10 Jun 2014 15:25:24 -0700

There are a few people in the IRC channel that have done it, however,
generally, cross-WAN clusters are not recommended as ES is sensitive to
latency.


You may be better off using the snapshot/restore process, or another
export/import method.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [email protected]
web: www.campaignmonitor.com


On 11 June 2014 03:11, Todd Nine <[email protected]> wrote:

> Hey guys,
>   One last question.  Does anyone do multi region replication with ES?  My
> current understanding is that with a multi region cluster, documents will
> be routed to the Region with a node that "owns" the shard the document is
> being written to.  In our use cases, our cluster must survive a WAN outage.
>  We don't want the latency of the writes or reads crossing the WAN
> connection.  Our documents are immutable, so we can work with multi region
> writes.  We simply need to replicate the write to other regions, as well as
> the deletes.  Are there any examples or implementations of this?
>
> Thanks,
> Todd
>
> On Thursday, June 5, 2014 4:11:44 PM UTC-6, Jörg Prante wrote:
>>
>> Yes, routing is very powerful. The general use case is to introduce a
>> mapping to a large number of shards so you can store parts of data all at
>> the same shard which is good for locality concepts. For example, combined
>> with index alias working on filter terms, you can create one big concrete
>> index, and use segments of it, so many thousands of users can share a
>> single index.
>>
>> Another use case for routing might be a time windowed index where each
>> shard holds a time window. There are many examples around logstash.
>>
>> The combination of index alias and routing is also known as "shard
>> overallocation". The concept might look complex first but the other option
>> would be to create a concrete index for every user which might also be a
>> waste of resources.
>>
>> Though some here on this list have managed to run 10000s of shards on a
>> single node I still find this breathtaking - a few dozens of shards per
>> node should be ok. Each shard takes some MB on the heap (there are tricks
>> to reduce this a bit) but a high number of shards takes a handful of
>> resources even without executing a single query. There might be other
>> factors worth considering, for example a size limit for a single shard. It
>> can be quite handy to let ES having move around shards of 1-10 GB instead
>> of a few 100 GB - it is faster at index recovery or at reallocation time.
>>
>> Jörg
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jun 5, 2014 at 9:44 PM, Todd Nine <[email protected]> wrote:
>>
>>> Hey Jorg,
>>>   Thanks for the reply.  We're using Cassandra heavily in production,
>>> I'm very familiar with the scale out out concepts.  What we've seen in all
>>> our distributed systems is that at some point, you reach a saturation of
>>> your capacity for a single node.  In the case of ES, to me that would seem
>>> to be shard count.  Eventually, all 5 shards can become too large for a
>>> node to handle updates and reads efficiently.  This can be caused by a high
>>> number of documents or document size, or both.  Once we reach this state,
>>> that index is "full" in the sense that the nodes containing these can no
>>> longer continue to service traffic at the rate we need it to.  We have 2
>>> options.
>>>
>>> 1) Get bigger hardware.  We do this occasionally, but not ideal since
>>> this is a distributed system.
>>>
>>> 2) Scale out, as you said.  In the case of write throughput it seems
>>> that we can do this with a pattern of alias + new index, but it's not clear
>>> to me if that's the right approach.  My initial thinking is to define  some
>>> sort of routing that pivots on created date to the new index since that's
>>> an immutable field.  Thoughts?
>>>
>>> In the case of read throughput, we an create more replicas.  Our systems
>>> is about 50/50 now, some users are even read/write, others are very read
>>> heavy.  I'll probably come up with 2 indexing strategies we can apply to an
>>> application's index based on the heuristics from the operations they're
>>> performing.
>>>
>>>
>>> Thanks for the feedback!
>>> Todd
>>>
>>>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 10:55 AM, [email protected] <[email protected]>
>>> wrote:
>>>
>>>> Thanks for raising the questions, I will come back later in more detail.
>>>>
>>>> Just a quick note, the idea about "shards scale write" and "replica
>>>> scale read" is correct, but Elasticsearch is also "elastic" which means it
>>>> "scales out", by adding node hardware. The shard/replica scale pattern
>>>> finds its limits in a node hardware, because shards/replica are tied to
>>>> machines, and there are the hard resource constraints, mostly disk I/O and
>>>> memory related.
>>>>
>>>> In the end, you can take as a rule of thumb:
>>>>
>>>> - add replica to scale "read" load
>>>> - add new indices (i.e. new shards) to scale "write" load
>>>> - and add nodes to scale out the whole cluster for both read and write
>>>> load
>>>>
>>>> More later,
>>>>
>>>> Jörg
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 7:17 PM, Todd Nine <[email protected]> wrote:
>>>>
>>>>> Hey Jörg,
>>>>>   Thank you for your response.  A few questions/points.
>>>>>
>>>>> In our use cases, the inability to write or read is considered a
>>>>> downtime.  Therefore, I cannot disable writes during expansion.  Your 
>>>>> alias
>>>>> points raise
>>>>> some interesting research I need to do, and I have a few follow up
>>>>> questions.
>>>>>
>>>>>
>>>>>
>>>>> Our systems are fully multi tenant.  We currently intend to have 1
>>>>> index per application.  Each application can have a large number of types
>>>>> within their index. Each cluster could potentially hold 1000 or more
>>>>> applications/indexes.  Most users will never need more than 5 shards.
>>>>> Some users are huge power users, billions of documents for a type with
>>>>> several large types.  These are the users I am concerned with.
>>>>>
>>>>> From my understanding and experimentation, Elastic Search has 2
>>>>> primary mechanisms for tuning performance to handle the load.  First is 
>>>>> the
>>>>> shard count.  The higher the shard count, the more writes you can accept.
>>>>>  Each shard has a master which accepts the write, and replicates the write
>>>>> to it's replicas.  For high write throughput, you increase the count of
>>>>> shards to distribute the load across more nodes.  For read throughput, you
>>>>> increase the replica count.  This gives you higher performance on read,
>>>>> since you now have more than 1 node per shard you can query to get 
>>>>> results.
>>>>>
>>>>>
>>>>>
>>>>> Per your suggestion, rather than than copy the documents from an index
>>>>> with 5 shards to an index with 10 shards, I can theoretically create a new
>>>>> index then add it the alias. For instance, I envision this in the 
>>>>> following
>>>>> way.
>>>>>
>>>>>
>>>>> Aliases:
>>>>> app1-read
>>>>> app1-write
>>>>>
>>>>>
>>>>> Initial creation:
>>>>>
>>>>> app1-read -> Index: App1-index1 (5 shards, 2 replicas)
>>>>> app1-write -> Index: App1-index1
>>>>>
>>>>> User begins to have too much data in App1-index1.  The 5 shards are
>>>>> causing hotspots. The following actions take place.
>>>>>
>>>>> 1) Create App1-index2 (5 shards, 2 replicas)
>>>>>
>>>>> 2) Update app1-read -> App1-index1 and App1-index2
>>>>>
>>>>> 3) Update app1-read -> App1-index1 and App1-index2
>>>>>
>>>>>
>>>>> I have some uncertainty around this I could use help with.
>>>>>
>>>>> Once the new index has been added, how are requests routed?  For
>>>>> instance, if I have a document "doc1" in the App1-index1, and I delete it
>>>>> after adding the new index, is the alias smart enough to update
>>>>> App1-index1, or will it broadcast the operation to both indexes?  In other
>>>>> words, if I create an alias with 2 or more indexes, will the alias perform
>>>>> routing, or is it a broadcast to all indexes.  How does this scale in
>>>>> practice?
>>>>>
>>>>>
>>>>> In my current understanding, using an alias on read is simply going to
>>>>> be the same as if you doubled shards.  If you have 10 shard (replication 
>>>>> 2)
>>>>> index, this is functionally equivalent to aliases that aggregate 2 indexes
>>>>> with 5 shards and 2 replicas each.  All 10 shards (or one of their
>>>>> replicas) would need to be read to aggregate the results, correct?
>>>>>
>>>>>
>>>>>
>>>>> Lastly, we have a Multi Region requirement.  We are by eventually
>>>>> consistent by design between regions.  We want documents written in one
>>>>> region to be replicated to another for query.  All documents are 
>>>>> immutable.
>>>>>  This is by design, so we don't get document version collisions between
>>>>> data centers.  What are some current mechanisms in use in production
>>>>> environments to replicate indexes across regions?  I just can't seem to
>>>>> find any.  Rivers was my initial thinking so regions can pull data from
>>>>> other regions.  However if this isn't a good fit, what is?
>>>>>
>>>>> Thanks guys!
>>>>> Todd
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  On Thu, Jun 5, 2014 at 9:21 AM, [email protected] <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> The knapsack plugin does not come with a downtime. You can increase
>>>>>> shards on the fly by copying an index over to another index (even on
>>>>>> another cluster). The index should be write disabled during copy though.
>>>>>>
>>>>>> Increasing replica level is a very simple command, no index copy
>>>>>> required.
>>>>>>
>>>>>> It seems you have a slight misconception about controlling replica
>>>>>> shards. You can not start dedicated copy actions only from the replica. 
>>>>>> (By
>>>>>> setting _preference for search, this works for queries).
>>>>>>
>>>>>> Maybe I do not understand your question, but what do you mean by
>>>>>> "dual writes"? And why would you "move" an index?
>>>>>>
>>>>>> Please check the index aliases. The concept of index aliases allow
>>>>>> redirecting index names in the API by a simple atomic command.
>>>>>>
>>>>>> It will be tough to monitor an outgrowing index since there is no
>>>>>> clear indication of the type "this cluster capacity is full because the
>>>>>> index is too large or overloaded, please add your nodes now". In real 
>>>>>> life,
>>>>>> heaps will fill up here and there, latency will increase, all of a sudden
>>>>>> queries or indexing will congest now and then. If you encounter this, you
>>>>>> have no time to copy an old index to a new one - the copy process also
>>>>>> takes resources, and the cluster may not have enough. You must begin to 
>>>>>> add
>>>>>> nodes way before capacity limit is reached.
>>>>>>
>>>>>> Instead of copying an index, which is a burden, you should consider
>>>>>> managing a bunch of indices. If an old index is too small, just start a 
>>>>>> new
>>>>>> one which is bigger and has more shards and spans more nodes, and add 
>>>>>> them
>>>>>> to the existing set of indices. With index alias you can combine many
>>>>>> indices into one index name. This is very powerful.
>>>>>>
>>>>>> If you can not estimate the data growth rate, I recommend also to use
>>>>>> a reasonable number of shards from the very start. Say, if you expect 50
>>>>>> servers to run an ES node on, then simply start with 50 shards on a small
>>>>>> number of servers, and add servers over time. You won't have to bother
>>>>>> about shard count for a very long time if you choose such a strategy.
>>>>>>
>>>>>> Do not think about rivers, they are not built for such use cases.
>>>>>> Rivers are designed as a "play tool" for fetching data quickly from
>>>>>> external sources, for demo purpose. They are discouraged for serious
>>>>>> production use, they are not very reliable if they run unattended.
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 5, 2014 at 7:33 AM, Todd Nine <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> 2) https://github.com/jprante/elasticsearch-knapsack might do what
>>>>>>>> you want.
>>>>>>>>
>>>>>>>
>>>>>>> This won't quite work for us.  We can't have any down time, so it
>>>>>>> seems like an A/B system is more appropriate.  What we're currently
>>>>>>> thinking is the following.
>>>>>>>
>>>>>>> Each index has 2 aliases, a read and a write alias.
>>>>>>>
>>>>>>> 1) Both read and write aliases point to an initial index. Say shard
>>>>>>> count 5 replication 2 (ES is not our canonical data source, so we're ok
>>>>>>> with reconstructing search data)
>>>>>>>
>>>>>>> 2) We detect via monitoring we're going to outgrow an index. We
>>>>>>> create a new index with more shards, and potentially a higher 
>>>>>>> replication
>>>>>>> depending on read load.  We then update the write alias to point to both
>>>>>>> the old and new index.  All clients will then being dual writes to both
>>>>>>> indexes.
>>>>>>>
>>>>>>> 3) While we're writing to old and new, some process (maybe a river?)
>>>>>>> will begin copying documents updated < the write alias time from the old
>>>>>>> index to the new index.  Ideally, it would be nice if each replica could
>>>>>>> copy only it's local documents into the new index.  We'll want to 
>>>>>>> throttle
>>>>>>> this as well.  Each node will need additional operational capacity
>>>>>>> to accommodate the dual writes as well as accepting the write of the 
>>>>>>> "old"
>>>>>>> documents.  I'm concerned if we push this through too fast, we could 
>>>>>>> cause
>>>>>>> interruptions of service.
>>>>>>>
>>>>>>>
>>>>>>> 4) Once the copy is completed, the read index is moved to the new
>>>>>>> index, then the old index is removed from the system.
>>>>>>>
>>>>>>> Could such a process be implemented as a plugin?  If the work can
>>>>>>> happen in parallel across all nodes containing a shard we can increase 
>>>>>>> the
>>>>>>> process's speed dramatically.  If we have a single worker, like a 
>>>>>>> river, it
>>>>>>> might possibly take too long.
>>>>>>>
>>>>>>>  --
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "elasticsearch" group.
>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>>> topic/elasticsearch/4qO5BZSxWhc/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> [email protected].
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_
>>>>>> hHe7inoZOwQ%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGcxRGjZr%3DS9%3DSaOZYddWyNxjvFNJF41T_hHe7inoZOwQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/CA%2Byzqf_wMHGZcK%2Bxozm%3Dxz4zh5prKmuob%
>>>>> 3DBbXKzrHoVgiOk1wA%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf_wMHGZcK%2Bxozm%3Dxz4zh5prKmuob%3DBbXKzrHoVgiOk1wA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "elasticsearch" group.
>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>> topic/elasticsearch/4qO5BZSxWhc/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/CAKdsXoEiKbTqhbYM2eeCCh5Y%2BD%
>>>> 3D1-isbzVPkK7gTeLa4X96spQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEiKbTqhbYM2eeCCh5Y%2BD%3D1-isbzVPkK7gTeLa4X96spQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/CA%2Byzqf-f%3DDYUrVebgzA7Dd2JVcxRo7wkXBHZd
>>> Xtf6GZGD2garw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf-f%3DDYUrVebgzA7Dd2JVcxRo7wkXBHZdXtf6GZGD2garw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a7271f1f-850c-478f-83e8-21b323b07e46%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/a7271f1f-850c-478f-83e8-21b323b07e46%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aQ0jfM4SRLrAB19PDTPhkOA%3DnZJUJhW3U1QkuRcTa4Jw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard count and plugin questions

Reply via email to