On Wed, Jun 18, 2014 at 2:38 AM, Maco Ma <[email protected]> wrote:
> I tried your script with setting iwc.setRAMBufferSizeMB(40000)/ and 48G > heap size. The speed can be around 430 docs/sec before the first flush and > the final speed is 350 docs/sec. Not sure what configuration Solr uses and > its ingestion speed can be 800 docs/sec. > Well, probably the difference is threads? That simple Lucene test uses only 1 thread, but your ES/Solr test uses 10 threads. I think the cost in ES is how the MapperService maintains mappings for all fields; I don't think there's a quick fix to reduce this cost. But net/net you really need to take a step back and re-evaluate your approach here: even if you use Solr, indexing at 800 docs/sec using 10 threads is awful indexing performance and this is because Lucene itself has a high cost per field, at indexing time and searching time. E.g. have you tried opening a searcher once you've built a large index with so many unique fields? The heap usage will be very high. Tested search performance on that searcher? Merging cost will be very high, etc. Lucene is just not optimized for the "zillions of unique fields" case, because you can so easily move those N fields into a single field; e.g. if this is just for simple term filtering, make a single field and then as terms insert "fieldName:fieldValue" as your tokens. If you insist on creating so many unique fields in your use case you will be unhappy down the road with Lucene ... Mike McCandless http://blog.mikemccandless.com -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdMdQDP1e8MhxnJb%2BBWU02pmjTVfoV6r-BTNescv4%3DSvQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
