What is "the default of JVM 64 MB limit"? Elasticsearch uses by default 1
GB heap, not 64 MB. Maybe you have an extra JVM with your bulk client that
uses 64 MB? This is much too few. Use 4-6 GB heap if your machine allows
that.

Note, JVM 7 of OpenJDK/Oracle, which is recommended, uses 25% of your host
RAM by default for your heap, not 64 MB.

1. You can use the BulkProcessor in the Java API which also has a volume
chunk limit instead of doc num, the default is 5 MB. 64 MB is a very large
bulk size. Bulk sizes of ~2GB are very bad since that will thrash all the
heap on the ES nodes and this induces severe GC problems and delays. I
recommend 1-10 MB, so each bulk responds within 1 second, and GC is very
fast. You can run bulks concurrently to increase speed. To find the sweet
spot of your client/server situation, you have to experiment with your
setup: choose 1MB and 1 concurrent thread, then 2MB and 1 concurrent
thread, 2MB / 2 threads etc. until you see rates declining. ES has some
internal settings that avoid an overrun of the whole cluster.

2. Most important is to set replica to 0 to make place for better
performance while bulk indexing, and disable the refresh rate of default
1sec to -1. After bulk, re-enable refresh, optimize, and add replica. There
are other more advanced knobs like throttling at store level or thread pool
or queue sizes but changing the defaults do not influence bulk performance
that much.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHJUaqonj7G50zNQ_xU6Prbw3GXayFTGwp-o11FdHr3cw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to