Hi Mark.
Thanks for getting back to us. What are the options should we want to keep our customers' data separate - like a chinese wall strategy? Although it is technically possible to have them together, we have other operational and business reasons to have them separate. I'll try with the one replica x 3 shards with the setup we have on one customer only and post the findings :) Thanks On Friday, January 16, 2015 at 11:15:29 PM UTC, Mark Walkom wrote: > > You've got too many replicas and shards. One shard per node (maybe 2) and > one replica is enough. > > You should be using the bulk API as well. > > What's your heap set to? > > Also consider combining customers into one index, it'll reduce the work > you need to do. > On 17/01/2015 4:07 am, "Nawaaz Soogund" <[email protected] > <javascript:>> wrote: > >> We are experiencing some performance issues or anomalies on a >> elasticsearch specifically on a system we are currently building. >> >> >> >> *The requirements:* >> >> We need to capture data for multiple of our customers, who will query >> and report on them on a near real time basis. All the documents received >> are the same format with the same properties and are in a flat structure >> (all fields are of primary type and no nested objects). We want to keep >> each customer’s information separate from each other. >> >> >> >> *Frequency of data received and queried:* >> >> We receive data for each customer at a fluctuating rate of 200 to 700 >> documents per second – with the peak being in the middle of the day. >> >> Queries will be mostly aggregations over around 12 million documents per >> customer – histogram/percentiles to show patterns over time and the >> occasional raw document retrieval to find out what happened a particular >> point in time. We are aiming to serve 50 to 100 customer at varying rates >> of documents inserted – the smallest one could be 20 docs/sec to the >> largest one peaking at 1000 docs/sec for some minutes. >> >> >> >> *How are we storing the data:* >> >> Each customer has one index per day. For example, if we have 5 customers, >> there will be a total of 35 indexes for the whole week. The reason we break >> it per day is because it is mostly the latest two that get queried with >> occasionally the remaining others. We also do it that way so we can delete >> older indexes independently of customers (some may want to keep 7 days, >> some 14 days’ worth of data) >> >> >> >> *How we are inserting:* >> >> We are sending data in batches of 10 to 2000 – every second. One document >> is around 900bytes raw. >> >> >> >> *Environment* >> >> AWS C3-Large – 3 nodes >> >> All indexes are created with 10 shards with 2 replica for the test >> purposes >> >> Both Elasticsearch 1.3.2 and 1.4.1 >> >> >> >> *What we have noticed:* >> >> If I push data to one index only, Response time starts at 80 to 100ms >> for each batch inserted when the rate of insert is around 100 documents per >> second. I ramp it up and I can reach 1600 before the rate of insert goes >> to close to 1sec per batch and when I increase it to close to 1700, it will >> hit a wall at some point because of concurrent insertions and the time will >> spiral to 4 or 5 seconds. Saying that, if I reduce the rate of inserts, >> Elasticsearch recovers nicely. CPU usage increases as rate increases. >> >> >> >> If I push to 2 indexes concurrently, I can reach a total of 1100 and CPU >> goes up to 93% around 900 documents per second. >> >> If I push to 3 indexes concurrently, I can reach a total of 150 and CPU >> goes up to 95 to 97%. I tried it many times. The interesting thing is that >> response time is around 109ms at the time. I can increase the load to 900 >> and response time will still be around 400 to 600 but CPU stays up. >> >> >> *Question:* >> >> Looking at our requirements and findings above, is the design convenient >> for what’s asked? Are there any tests that I can do to find out more? Is >> there any setting that I need to check (and change)? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/76ecd8bb-97cc-4125-8f1a-50de69c2790f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/76ecd8bb-97cc-4125-8f1a-50de69c2790f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0c441669-4f65-4f99-8402-d16814adc23e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
