I tried your script with setting iwc.setRAMBufferSizeMB(40000)/ and 48G heap size. The speed can be around 430 docs/sec before the first flush and the final speed is 350 docs/sec. Not sure what configuration Solr uses and its ingestion speed can be 800 docs/sec.
Maco On Wednesday, June 18, 2014 6:09:07 AM UTC+8, Michael McCandless wrote: > > I tested roughly your Scenario 2 (100K unique fields, 100 fields per > document) with a straight Lucene test (attached, but not sure if the list > strips attachments). Net/net I see ~100 docs/sec with one thread ... which > is very slow. > > Lucene stores quite a lot for each unique indexed field name and it's > really a bad idea to plan on having so many unique fields in the index: > you'll spend lots of RAM and CPU. > > Can you describe the wider use case here? Maybe there's a more performant > way to achieve it... > > > > On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin <[email protected] > <javascript:>> wrote: > >> Hi, Mark: >> >> We are doing single document ingestion. We did a performance comparison >> between Solr and Elastic Search (ES). >> The performance for ES degrades dramatically when we increase the >> metadata fields where Solr performance remains the same. >> The performance is done in very small data set (ie. 10k documents, the >> index size is only 75mb). The machine is a high spec machine with 48GB >> memory. >> You can see ES performance drop 50% even when the machine have plenty >> memory. ES consumes all the machine memory when metadata field increased to >> 100k. >> This behavior seems abnormal since the data is really tiny. >> >> We also tried with larger data set (ie. 100k and 1Mil documents), ES >> throw OOW for scenario 2 for 1 Mil doc scenario. >> We want to know whether this is a bug in ES and/or is there any >> workaround (config step) we can use to eliminate the performance >> degradation. >> Currently ES performance does not meet the customer requirement so we >> want to see if there is anyway we can bring ES performance to the same >> level as Solr. >> >> Below is the configuration setting and benchmark results for 10k document >> set. >> scenario 0 means there are 1000 different metadata fields in the system. >> scenario 1 means there are 10k different metatdata fields in the system. >> scenario 2 means there are 100k different metadata fields in the system. >> scenario 3 means there are 1M different metadata fields in the system. >> >> - disable hard-commit & soft commit + use a *client* to do commit (ES >> & Solr) every 10 second >> - ES: flush, refresh are disabled >> - Solr: autoSoftCommit are disabled >> - monitor load on the system (cpu, memory, etc) or the ingestion >> speed change over time >> - monitor the ingestion speed (is there any degradation over time?) >> - new ES config:new_ES_config.sh >> >> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh>; >> >> new ingestion: new_ES_ingest_threads.pl >> >> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl> >> >> - new Solr ingestion: new_Solr_ingest_threads.pl >> >> <https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl> >> - flush interval: 10s >> >> >> Number of different meta data fieldESSolrScenario 0: 100012secs -> >> 833docs/sec >> CPU: 30.24% >> Heap: 1.08G >> time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1 >> index size: 36M >> iowait: 0.02%13 secs -> 769 docs/sec >> CPU: 28.85% >> Heap: 9.39G >> time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs -> >> 345docs/sec >> CPU: 40.83% >> Heap: 5.74G >> time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1 >> iowait: 0.02% >> Index Size: 36M12 secs -> 833 docs/sec >> CPU: 28.62% >> Heap: 9.88G >> time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 Scenario 2: 100k17 mins >> 44 secs -> 9.4docs/sec >> CPU: 54.73% >> Heap: 47.99G >> time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40 >> iowait: 0.02% >> Index Size: 75M13 secs -> 769 docs/sec >> CPU: 29.43% >> Heap: 9.84G >> time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 >> secs -> 0.9 docs/sec >> CPU: 40.47% >> Heap: 47.99G >> time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594 15 >> secs -> 666.7 docs/sec >> CPU: 45.10% >> Heap: 9.64G >> time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2 >> >> Thanks! >> Cindy >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bb58c57b-37b1-46b2-b8b6-f26761cdd55f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
