Re: ingest performance degrades sharply along with the documents having more fileds

Maco Ma Tue, 17 Jun 2014 23:39:08 -0700

I tried your script with setting iwc.setRAMBufferSizeMB(40000)/ and 48G 
heap size. The speed can be around 430 docs/sec before the first flush and 
the final speed is 350 docs/sec. Not sure what configuration Solr uses and 
its ingestion speed can be 800 docs/sec.


Maco

On Wednesday, June 18, 2014 6:09:07 AM UTC+8, Michael McCandless wrote:
>
> I tested roughly your Scenario 2 (100K unique fields, 100 fields per 
> document) with a straight Lucene test (attached, but not sure if the list 
> strips attachments).  Net/net I see ~100 docs/sec with one thread ... which 
> is very slow.
>
> Lucene stores quite a lot for each unique indexed field name and it's 
> really a bad idea to plan on having so many unique fields in the index: 
> you'll spend lots of RAM and CPU.
>
> Can you describe the wider use case here?  Maybe there's a more performant 
> way to achieve it...
>
>
>
> On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin <[email protected] 
> <javascript:>> wrote:
>
>> Hi, Mark:
>>
>> We are doing single document ingestion. We did a performance comparison 
>> between Solr and Elastic Search (ES).
>> The performance for ES degrades dramatically when we increase the 
>> metadata fields where Solr performance remains the same. 
>> The performance is done in very small data set (ie. 10k documents, the 
>> index size is only 75mb). The machine is a high spec machine with 48GB 
>> memory.
>> You can see ES performance drop 50% even when the machine have plenty 
>> memory. ES consumes all the machine memory when metadata field increased to 
>> 100k. 
>> This behavior seems abnormal since the data is really tiny.
>>
>> We also tried with larger data set (ie. 100k and 1Mil documents), ES 
>> throw OOW for scenario 2 for 1 Mil doc scenario. 
>> We want to know whether this is a bug in ES and/or is there any 
>> workaround (config step) we can use to eliminate the performance 
>> degradation. 
>> Currently ES performance does not meet the customer requirement so we 
>> want to see if there is anyway we can bring ES performance to the same 
>> level as Solr.
>>
>> Below is the configuration setting and benchmark results for 10k document 
>> set.
>> scenario 0 means there are 1000 different metadata fields in the system.
>> scenario 1 means there are 10k different metatdata fields in the system.
>> scenario 2 means there are 100k different metadata fields in the system.
>> scenario 3 means there are 1M different metadata fields in the system.
>>
>>    - disable hard-commit & soft commit + use a *client* to do commit (ES 
>>    & Solr) every 10 second
>>    - ES: flush, refresh are disabled
>>       - Solr: autoSoftCommit are disabled
>>    - monitor load on the system (cpu, memory, etc) or the ingestion 
>>    speed change over time
>>    - monitor the ingestion speed (is there any degradation over time?) 
>>    - new ES config:new_ES_config.sh 
>>    
>> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh>;
>>  
>>    new ingestion: new_ES_ingest_threads.pl 
>>    
>> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl>
>>  
>>    - new Solr ingestion: new_Solr_ingest_threads.pl 
>>    
>> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl>
>>    - flush interval: 10s
>>
>>
>> Number of different meta data fieldESSolrScenario 0: 100012secs -> 
>> 833docs/sec
>> CPU: 30.24%
>> Heap: 1.08G
>> time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
>> index size: 36M
>> iowait: 0.02%13 secs -> 769 docs/sec
>> CPU: 28.85%
>> Heap: 9.39G
>> time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs -> 
>> 345docs/sec
>> CPU: 40.83%
>> Heap: 5.74G
>> time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
>> iowait: 0.02%
>> Index Size: 36M12 secs -> 833 docs/sec
>> CPU: 28.62%
>> Heap: 9.88G
>> time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 Scenario 2: 100k17 mins 
>> 44 secs -> 9.4docs/sec
>> CPU: 54.73%
>> Heap: 47.99G
>> time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
>> iowait: 0.02%
>> Index Size: 75M13 secs -> 769 docs/sec
>> CPU: 29.43%
>> Heap: 9.84G
>> time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 
>> secs -> 0.9 docs/sec
>> CPU: 40.47%
>> Heap: 47.99G
>> time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594 15 
>> secs -> 666.7 docs/sec
>> CPU: 45.10%
>> Heap: 9.64G
>> time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2
>>
>> Thanks!
>> Cindy
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb58c57b-37b1-46b2-b8b6-f26761cdd55f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ingest performance degrades sharply along with the documents having more fileds

Reply via email to