Re: Elasticsearch ingest performance

dave Thu, 23 Apr 2015 05:01:24 -0700

Just some thoughts.  Yeah, with 16 cores per machine and 10 machines having 
5 shards per index is probably too low.


What are your system metrics telling you?  Are the CPUs idle?  What does 
the CPU I/O wait look like?  
Are you doing single index operations or batch index operations with YCSB?

Another thing to think about.  YCSB was built to test the key/value 
performance properties of a database.  If I remember correctly the values 
put into the strings are randomly generated.  Pure random is about the 
worst case possible for cardinality when it comes to full text indexing 
data structures, so you might want to adjust for that when creating your 
mapping for the index.  If the values are pure random rather than randomly 
pulled from a dictionary of fixed length (English only has 200k or so 
words) then the data you are putting in may be penalizing ES for having 
indexing features turned on by default.


On Thursday, April 23, 2015 at 5:25:30 AM UTC-4, Michael McCandless wrote:
>
> You can try the ideas here too: 
> https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing
>
> Mike McCandless
>
> On Wed, Apr 22, 2015 at 8:00 PM, Kimbro Staken <kst...@kstaken.com 
> <javascript:>> wrote:
>
>> Hello Brian,
>>
>> Many things will affect the rate of ingest, the biggest one is making 
>> sure the load gets spread around. But are you sure ES is what's 
>> bottlenecking here? With only 5 shards you're only using half your cluster 
>> but I'm willing to bet your 20 threads on the importer isn't maxing that 
>> out. Also you need to make sure the import process is spreading connections 
>> across the nodes otherwise you may be limited in other ways by the node 
>> you're connecting to. Also make sure the client is using bulk requests and 
>> experiment with the bulk sizes.
>>
>> FYI, I've been testing a new system configuration using an 8 core Avoton 
>> CPU with 6 x SSDs in a RAID 0. On this system (single node) ingest can 
>> sustain around 3,500 docs/sec of similar size to your load before it 
>> becomes CPU bound. You have much more CPU capacity so I would expect your 
>> hardware to be able to exceed this by a fair margin, your current numbers 
>> don't show that. 
>>
>>
>> Kimbro Staken
>>
>>
>>
>>
>>
>> On Wed, Apr 22, 2015 at 4:16 PM, <bpar...@maprtech.com <javascript:>> 
>> wrote:
>>
>>> We are running a 10-node Elasticsearch 1.4.2 cluster, and getting 
>>> cluster wide throughput of 18161 docs/sec, or about 18MB/sec.  We'd like to 
>>> improve this as much as we can, without impacting query times too much.
>>>
>>> Our hardware:
>>>
>>> RAM: 128GB
>>> Disks: 8 disks, 7200 RPM, 1TB in a RAID 0 array
>>> CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz - 16 physical cores, 32 
>>> HT cores
>>> Network: 1x10gbe
>>>
>>> They are running CentOS 6.5, and java version 1.7.0_67.  We're setting 
>>> the Elasticsearch heap size to 30GB.
>>>
>>> We are testing ingest by inserting 10GB of data with YCSB.  Document 
>>> sizes are 1KB, with 10 string fields, each 100 bytes.  There is 1 YCSB 
>>> client with 20 threads, writing to a single index with 5 shards and 0 
>>> replicas.  YCSB connects using the Java Node Client.
>>>
>>> What is the expected ingest rate in this type of environment?  What 
>>> parameters are recommended to increase the ingest rate?
>>>
>>> Thanks,
>>>
>>> Brian
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/5733e9d0-b877-4dc3-b5c1-d341365ec6b2%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/5733e9d0-b877-4dc3-b5c1-d341365ec6b2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAA0DmXbAOBrWgb6O0PHD3LgBiqfckSnifAGVura1%2BQ05f1d-LA%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAA0DmXbAOBrWgb6O0PHD3LgBiqfckSnifAGVura1%2BQ05f1d-LA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d21b5619-6c3c-4f8a-bd8c-6d3c3f05d8cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch ingest performance

Reply via email to