Hi!
After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge
operation will take long time, about double of previous latency.
Based on our investigation, we found the follows is main cause of the
force-merge performance decrease:
* From Lucene 8.0, NormsProducer is added as input parameter to function
mergeTerms in org.apache.lucene.index.SegmentMerger.java.
<< Cause analysis
>From Lucene 8.0, We find that NormsProducer is added as input parameter to
>function mergeTerms in org.apache.lucene.index.SegmentMerger.java.
The function mergeTerms is used to create .tim, .tip, .doc, .pos, .pay for each
term.
This change is related to merge operation of norms setting of fields.
< merge() function before Lucene 8.0
mergeTerms(segmentWriteState);
< merge() function of lucene 8.0
try (NormsProducer norms = mergeState.mergeFieldInfos.hasNorms()
? codec.normsFormat().normsProducer(segmentReadState)
: null) {
NormsProducer normsMergeInstance = null;
if (norms != null) {
// Use the merge instance in order to reuse the same IndexInput for all
terms
normsMergeInstance = norms.getMergeInstance();
}
mergeTerms(segmentWriteState, normsMergeInstance); }
<< Test cases and result
In order to validate that above analysis is the main cause of force-merge
performance decrease, we design some test cases.
< Test environment
* ES cluster: 3 master nodes /1 client node /3 data nodes with i3.2xlarge
* Data: 13216068 docs
* Index: 3 primary, 0 replica
< Test steps
1. modify merge policy setting & norms setting in ES mapping file.
2. load data into ES cluster && record running duration
3. run index_name/_flush
4. run _cat segments & save output
5. run _forcemerge
6. run _cat segments & save output
< Test result
No. | ES version | Lucene version | omit norms | force merge time
-----------------------------------------------------------------
1.1 | 6.8.13 | 7.7.2 | no | 13 min
1.2 | 6.8.13 | 7.7.2 | omit norms for all text, keyword fields | 14 min
2.1 | 7.9.1 | 8.6.2 | no | 31 min
2.2 | 7.9.1 | 8.6.2 | omit norms for all text, keyword fields | 13 min
<< My question is:
1. Why will this Norms related change cause obviously force-merge
performance decrease?
2. Is there any way to resolve it and improve force-merge performance for
Lucene 8.0+?
Look forward your answer and thanks a lot for your help.
Eileen Xie
Confidentiality note: This e-mail may contain confidential information from
Clarivate. If you are not the intended recipient, be aware that any disclosure,
copying, distribution or use of the contents of this e-mail is strictly
prohibited. If you have received this e-mail in error, please delete this
e-mail and notify the sender immediately.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]