Merge/segment understanding

Han JU Fri, 28 Mar 2014 11:00:18 -0700

Hi,

We want to understand how segments are created during bulk indexing.
Say we've set the following param:


"index.translog.flush_threshold_ops" : "50000",
"index.translog.flush_threshold_size": "300mb",

So it means that ES will not flush until it gets 50000 operations (index in 
this case). As a result, there's always 50000 documents get 
flushed/committed to Lucene
at a single time. So it's intuitive for us that Lucene will not create 
segments that has under 50000 documents.
But in our benchmark with this settings, we found out that there's lots of 
segments with, say, ~3000 documents, and the segment's size is far less 
than 300mb (the flush threshold).

My questions are:
- How do these small segments get generated given that we flush 50000 
documents at a time?
- Does avoid generating small segments helps indexing speed and merge speed?

We are using ElasticSearch v1.0.1 and we also set these when benchmarking:

{
"index":{
"merge.policy.max_merge_at_once":"999",
"merge.policy.segments_per_tier":"999",
"refresh_interval":"-1",
        }
}

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21ec20de-0a6d-4196-abf3-d12287544b7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Merge/segment understanding

Reply via email to