Re: Shard count and plugin questions

Todd Nine Wed, 04 Jun 2014 22:33:28 -0700

Thanks for the answers Mark.  See inline.

On Wed, Jun 4, 2014 at 3:51 PM, Mark Walkom <[email protected]>
wrote:

> 1) The answer is - it depends. You want to setup a test system with
> indicative specs, and then throw some sample data at it until things start
> to break. However this may help
> https://www.found.no/foundation/sizing-elasticsearch/
>

This is what I was expecting.  Thanks for the pointer to the documentation.
 We're going to have some pretty beefy clusters (SSDs Raid 0, 8 to 16 cores
and a lot of RAM) to power ES.  We're going to have a LOT of indexes, we
would be operating this as a core infrastructure service.  Is there an
upper limit on the amount of indexes a cluster can hold?

> 2) https://github.com/jprante/elasticsearch-knapsack might do what you
> want.
>

This won't quite work for us.  We can't have any down time, so it seems
like an A/B system is more appropriate.  What we're currently thinking is
the following.

Each index has 2 aliases, a read and a write alias.

1) Both read and write aliases point to an initial index. Say shard count 5
replication 2 (ES is not our canonical data source, so we're ok with
reconstructing search data)

2) We detect via monitoring we're going to outgrow an index. We create a
new index with more shards, and potentially a higher replication depending
on read load.  We then update the write alias to point to both the old and
new index.  All clients will then being dual writes to both indexes.

3) While we're writing to old and new, some process (maybe a river?) will
begin copying documents updated < the write alias time from the old index
to the new index.  Ideally, it would be nice if each replica could copy
only it's local documents into the new index.  We'll want to throttle this
as well.  Each node will need additional operational capacity
to accommodate the dual writes as well as accepting the write of the "old"
documents.  I'm concerned if we push this through too fast, we could cause
interruptions of service.

4) Once the copy is completed, the read index is moved to the new index,
then the old index is removed from the system.

Could such a process be implemented as a plugin?  If the work can happen in
parallel across all nodes containing a shard we can increase the process's
speed dramatically.  If we have a single worker, like a river, it might
possibly take too long.

3) How real time is real time? You can change index.refresh_interval to
> something small so that window of "unflushed" items is minimal, but that
> will have other impacts.
>

Once the index call returns to the caller, it would be immediately
available for query.  We're tried lowering the refresh rate, this results
is a pretty significant drop in throughput.  To meet our throughput
requirements, we're considering even turning it up to 5 or 15 seconds.  If
we can then search this data that's in our commit log (via storing it in
memory until flush) that would be ideal.

Thoughts?

> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: [email protected]
> web: www.campaignmonitor.com
>
>
> On 5 June 2014 04:18, Todd Nine <[email protected]> wrote:
>
>> Hi All,
>>   We've been using elastic search as our search index for our new
>> persistence implementation.
>>
>> https://usergrid.incubator.apache.org/
>>
>> I have a few questions I could use a hand with.
>>
>> 1) Is there any good documentation on the upper limit to count of
>> documents, or total index size, before you need to allocate more shards?
>>  Do shards have a real world limit on size or number of entries to keep
>> response times low?  Every system has it's limits, and I'm trying to find
>> some actual data on the size limits.  I've been trolling Google for some
>> answers, but I haven't really found any good test results.
>>
>>
>> 2) Currently, it's not possible to increase the shard count for an index.
>> The workaround is to create a new index with a higher count, and move
>> documents from the old index into the new.  Could this be accomplished via
>> a plugin?
>>
>>
>> 3) We sometimes have "realtime" requirements.  In that when an index call
>> is returned, it is available.  Flushing explicitly is not a good idea from
>> a performance perspective.    Has anyone explored searching in memory the
>> documents that have not yet been flushed and merging them with the Lucene
>> results?  Is this something that's feasible to be implemented via a plugin?
>>
>> Thanks in advance!
>> Todd
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/940c6404-6667-4846-b457-977e705d3797%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/940c6404-6667-4846-b457-977e705d3797%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/4qO5BZSxWhc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624aheN2C4wvZRnxxNA%3DpTzwgjHQwCLH0041d-J0DNj37_A%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAEM624aheN2C4wvZRnxxNA%3DpTzwgjHQwCLH0041d-J0DNj37_A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2Byzqf-VYZpBh6b_%2Br8W0fBs8b%3DU65gtjzt8PLe4uVx_b3nEDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard count and plugin questions

Reply via email to