[
https://issues.apache.org/jira/browse/LUCENE-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393782#comment-17393782
]
Suhan Mao commented on LUCENE-10025:
------------------------------------
[~dnhatn] I think [~zhangchao.es]'s question probably refer to this code:
{code:java}
// @Override
public int numDeletesToMerge(SegmentCommitInfo info, int delCount,
IOSupplier<CodecReader> readerSupplier) throws IOException {
final int numDeletesToMerge = super.numDeletesToMerge(info, delCount,
readerSupplier);
if (numDeletesToMerge != 0 && info.getSoftDelCount() > 0) {
final CodecReader reader = readerSupplier.get();
if (reader.getLiveDocs() != null) {
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(new DocValuesFieldExistsQuery(field),
BooleanClause.Occur.FILTER);
builder.add(retentionQuerySupplier.get(), BooleanClause.Occur.FILTER);
Scorer scorer = getScorer(builder.build(),
FilterCodecReader.wrapLiveDocs(reader, null, reader.maxDoc()));
if (scorer != null) {
DocIdSetIterator iterator = scorer.iterator();
Bits liveDocs = reader.getLiveDocs();
int numDeletedDocs = reader.numDeletedDocs();
while (iterator.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
if (liveDocs.get(iterator.docID()) == false) {
numDeletedDocs--;
}
}
return numDeletedDocs;
}
}
}
{code}
Why we have to iterate the scorer and check if the doc id is not in liveDocs?
Since each doc id from scorer must contain a soft delete field, they should
must not in live docs, why we should we do that check of
*_liveDocs.get(iterator.docID()) == false_* ?
> SoftDeletesRetentionMergePolicy#numDeletesToMerge caused indexing backlogged
> ----------------------------------------------------------------------------
>
> Key: LUCENE-10025
> URL: https://issues.apache.org/jira/browse/LUCENE-10025
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Affects Versions: 8.4
> Reporter: zhangchao.es
> Priority: Major
> Labels: indexing, soft-delete
> Attachments: flamegraph.html, image-2021-07-14-16-52-34-740.png
>
>
> In lucene-8246, numDeletesToMerge is added in SoftDeletesRetentionMergePolicy.
> if soft deleted docs is very more, and they are also in retention lease,the
> numDeletesToMerge funcation have performance issue
> for instance,a update indexing is writing to elasticsearch, then we move one
> a shard to an other node,If the moving continues for a long time, the size of
> old shard will become very big,because soft-deleted operations need to held
> by retention lease. The more soft-deleted documents, the slower the indexing.
> if the shard size is about 20GB, we can get the below flamegraph
>
> !image-2021-07-14-16-52-34-740.png!
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]