Re: Performance issues when sending documents to multiple indexes at the same time.

Nawaaz Soogund Mon, 19 Jan 2015 01:38:43 -0800

Hi Mark.


Thanks for getting back to us. What are the options should we want to keep 
our customers' data separate - like a chinese wall strategy? Although it is 
technically possible to have them together, we have other operational and 
business reasons to have them separate.

I'll try with the one replica x 3 shards with the setup we have on one 
customer only and post the findings :)

Thanks

On Friday, January 16, 2015 at 11:15:29 PM UTC, Mark Walkom wrote:
>
> You've got too many replicas and shards. One shard per node (maybe 2) and 
> one replica is enough.
>
> You should be using the bulk API as well.
>
> What's your heap set to?
>
> Also consider combining customers into one index, it'll reduce the work 
> you need to do.
>  On 17/01/2015 4:07 am, "Nawaaz Soogund" <[email protected] 
> <javascript:>> wrote:
>
>> We are experiencing some performance issues or anomalies on a 
>> elasticsearch specifically on a system we are currently building.
>>
>>  
>>
>> *The requirements:*
>>
>> We need to capture data for multiple of our customers,  who will query 
>> and report on them on a near real time basis. All the documents received 
>> are the same format with the same properties and are in a flat structure 
>> (all fields are of primary type and no nested objects). We want to keep 
>> each customer’s information separate from each other.
>>
>>  
>>
>> *Frequency of data received and queried:*
>>
>> We receive data for each customer at a fluctuating rate of 200 to 700 
>> documents per second – with the peak being in the middle of the day.
>>
>> Queries will be mostly aggregations over around 12 million documents per 
>> customer – histogram/percentiles to show patterns over time and the 
>> occasional raw document retrieval to find out what happened a particular 
>> point in time. We are aiming to serve 50 to 100 customer at varying rates 
>> of documents inserted – the smallest one could be 20 docs/sec to the 
>> largest one peaking at 1000 docs/sec for some minutes.
>>
>>  
>>
>> *How are we storing the data:*
>>
>> Each customer has one index per day. For example, if we have 5 customers, 
>> there will be a total of 35 indexes for the whole week. The reason we break 
>> it per day is because it is mostly the latest two that get queried with 
>> occasionally the remaining others. We also do it that way so we can delete 
>> older indexes independently of customers (some may want to keep 7 days, 
>> some 14 days’ worth of data)
>>
>>  
>>
>> *How we are inserting:*
>>
>> We are sending data in batches of 10 to 2000 – every second. One document 
>> is around 900bytes raw.
>>
>>  
>>
>> *Environment*
>>
>> AWS C3-Large – 3 nodes
>>
>> All indexes are created with 10 shards with 2 replica for the test 
>> purposes
>>
>> Both Elasticsearch 1.3.2 and 1.4.1
>>
>>  
>>
>> *What we have noticed:*
>>
>>  If I push data to one index only, Response time starts at 80 to 100ms 
>> for each batch inserted when the rate of insert is around 100 documents per 
>> second.  I ramp it up and I can reach 1600 before the rate of insert goes 
>> to close to 1sec per batch and when I increase it to close to 1700, it will 
>> hit a wall at some point because of concurrent insertions and the time will 
>> spiral to 4 or 5 seconds. Saying that, if I reduce the rate of inserts, 
>> Elasticsearch recovers nicely. CPU usage increases as rate increases.
>>
>>  
>>
>> If I push to 2 indexes concurrently, I can reach a total of 1100 and CPU 
>> goes up to 93% around 900 documents per second.
>>
>> If I push to 3 indexes concurrently, I can reach a total of 150 and CPU 
>> goes up to 95 to 97%. I tried it many times. The interesting thing is that 
>> response time is around 109ms at the time. I can increase the load to 900 
>> and response time will still be around 400 to 600 but CPU stays up.
>>
>>
>> *Question:*
>>
>> Looking at our requirements and findings above, is the design convenient 
>> for what’s asked? Are there any tests that I can do to find out more? Is 
>> there any setting that I need to check (and change)? 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/76ecd8bb-97cc-4125-8f1a-50de69c2790f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/76ecd8bb-97cc-4125-8f1a-50de69c2790f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c441669-4f65-4f99-8402-d16814adc23e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Performance issues when sending documents to multiple indexes at the same time.

Reply via email to