On Wed, Jun 18, 2014 at 2:38 AM, Maco Ma <[email protected]> wrote:

> I tried your script with setting iwc.setRAMBufferSizeMB(40000)/ and 48G
> heap size. The speed can be around 430 docs/sec before the first flush and
> the final speed is 350 docs/sec. Not sure what configuration Solr uses and
> its ingestion speed can be 800 docs/sec.
>

Well, probably the difference is threads?  That simple Lucene test uses
only 1 thread, but your ES/Solr test uses 10 threads.

I think the cost in ES is how the MapperService maintains mappings for all
fields; I don't think there's a quick fix to reduce this cost.

But net/net you really need to take a step back and re-evaluate your
approach here: even if you use Solr, indexing at 800 docs/sec using 10
threads is awful indexing performance and this is because Lucene itself has
a high cost per field, at indexing time and searching time.  E.g. have you
tried opening a searcher once you've built a large index with so many
unique fields?  The heap usage will be very high.  Tested search
performance on that searcher?  Merging cost will be very high, etc.

Lucene is just not optimized for the "zillions of unique fields" case,
because you can so easily move those N fields into a single field; e.g. if
this is just for simple term filtering, make a single field and then as
terms insert "fieldName:fieldValue" as your tokens.

If you insist on creating so many unique fields in your use case you will
be unhappy down the road with Lucene ...

Mike McCandless

http://blog.mikemccandless.com

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdMdQDP1e8MhxnJb%2BBWU02pmjTVfoV6r-BTNescv4%3DSvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to