Re: Shard count and plugin questions

Mark Walkom Wed, 04 Jun 2014 23:06:19 -0700

I haven't heard of a limit to the number of indexes, obviously the more you
have the larger the cluster state that needs to be maintained.


You might want to look into routing (
http://exploringelasticsearch.com/advanced_techniques.html or
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-routing-field.html)
as an alternative to optimise and minimise index count.
You can also always hedge your bets and create an index with a larger
number of shards, ie not a 1:1, shard:node relationship, and then move the
excess shards to new nodes as they are added.

I'd be interested to see how you could measure how you'd outgrow an index
though, technically it can just keep growing until the node can no longer
deal with it. This is something that testing is good for, throw data at a
single shard index and then when it falls over you have an indicator of how
your hardware will handle things.

As for reading the transaction log and searching it, you might be playing a
losing game as your code to parse and search would have to be super quick
to make worth doing.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [email protected]
web: www.campaignmonitor.com


On 5 June 2014 15:33, Todd Nine <[email protected]> wrote:

> Thanks for the answers Mark.  See inline.
>
>
> On Wed, Jun 4, 2014 at 3:51 PM, Mark Walkom <[email protected]>
> wrote:
>
>> 1) The answer is - it depends. You want to setup a test system with
>> indicative specs, and then throw some sample data at it until things start
>> to break. However this may help
>> https://www.found.no/foundation/sizing-elasticsearch/
>>
>
> This is what I was expecting.  Thanks for the pointer to the
> documentation.  We're going to have some pretty beefy clusters (SSDs Raid
> 0, 8 to 16 cores and a lot of RAM) to power ES.  We're going to have a LOT
> of indexes, we would be operating this as a core infrastructure service.
>  Is there an upper limit on the amount of indexes a cluster can hold?
>
>
>> 2) https://github.com/jprante/elasticsearch-knapsack might do what you
>> want.
>>
>
> This won't quite work for us.  We can't have any down time, so it seems
> like an A/B system is more appropriate.  What we're currently thinking is
> the following.
>
> Each index has 2 aliases, a read and a write alias.
>
> 1) Both read and write aliases point to an initial index. Say shard count
> 5 replication 2 (ES is not our canonical data source, so we're ok with
> reconstructing search data)
>
> 2) We detect via monitoring we're going to outgrow an index. We create a
> new index with more shards, and potentially a higher replication depending
> on read load.  We then update the write alias to point to both the old and
> new index.  All clients will then being dual writes to both indexes.
>
> 3) While we're writing to old and new, some process (maybe a river?) will
> begin copying documents updated < the write alias time from the old index
> to the new index.  Ideally, it would be nice if each replica could copy
> only it's local documents into the new index.  We'll want to throttle this
> as well.  Each node will need additional operational capacity
> to accommodate the dual writes as well as accepting the write of the "old"
> documents.  I'm concerned if we push this through too fast, we could cause
> interruptions of service.
>
>
> 4) Once the copy is completed, the read index is moved to the new index,
> then the old index is removed from the system.
>
> Could such a process be implemented as a plugin?  If the work can happen
> in parallel across all nodes containing a shard we can increase the
> process's speed dramatically.  If we have a single worker, like a river, it
> might possibly take too long.
>
>
> 3) How real time is real time? You can change index.refresh_interval to
>> something small so that window of "unflushed" items is minimal, but that
>> will have other impacts.
>>
>
> Once the index call returns to the caller, it would be immediately
> available for query.  We're tried lowering the refresh rate, this results
> is a pretty significant drop in throughput.  To meet our throughput
> requirements, we're considering even turning it up to 5 or 15 seconds.  If
> we can then search this data that's in our commit log (via storing it in
> memory until flush) that would be ideal.
>
> Thoughts?
>
>
>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: [email protected]
>> web: www.campaignmonitor.com
>>
>>
>> On 5 June 2014 04:18, Todd Nine <[email protected]> wrote:
>>
>>>  Hi All,
>>>   We've been using elastic search as our search index for our new
>>> persistence implementation.
>>>
>>> https://usergrid.incubator.apache.org/
>>>
>>> I have a few questions I could use a hand with.
>>>
>>> 1) Is there any good documentation on the upper limit to count of
>>> documents, or total index size, before you need to allocate more shards?
>>>  Do shards have a real world limit on size or number of entries to keep
>>> response times low?  Every system has it's limits, and I'm trying to find
>>> some actual data on the size limits.  I've been trolling Google for some
>>> answers, but I haven't really found any good test results.
>>>
>>>
>>> 2) Currently, it's not possible to increase the shard count for an
>>> index. The workaround is to create a new index with a higher count, and
>>> move documents from the old index into the new.  Could this be accomplished
>>> via a plugin?
>>>
>>>
>>> 3) We sometimes have "realtime" requirements.  In that when an index
>>> call is returned, it is available.  Flushing explicitly is not a good idea
>>> from a performance perspective.    Has anyone explored searching in memory
>>> the documents that have not yet been flushed and merging them with the
>>> Lucene results?  Is this something that's feasible to be implemented via a
>>> plugin?
>>>
>>> Thanks in advance!
>>> Todd
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/940c6404-6667-4846-b457-977e705d3797%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/940c6404-6667-4846-b457-977e705d3797%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/4qO5BZSxWhc/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAEM624aheN2C4wvZRnxxNA%3DpTzwgjHQwCLH0041d-J0DNj37_A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAEM624aheN2C4wvZRnxxNA%3DpTzwgjHQwCLH0041d-J0DNj37_A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf-VYZpBh6b_%2Br8W0fBs8b%3DU65gtjzt8PLe4uVx_b3nEDQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf-VYZpBh6b_%2Br8W0fBs8b%3DU65gtjzt8PLe4uVx_b3nEDQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b7A2vdvf2R7_EziPwh2o6BAAHrMqMJOk4cW%2BAY7ind5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard count and plugin questions

Reply via email to