[
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138695#comment-13138695
]
Roman Alekseenkov commented on LUCENE-2680:
-------------------------------------------
thank you, Robert
I was asking because we are having issues with 3.4.0 where applyDeletes() takes
an large amount of time on commit for 150GB index, and this is stopping all
indexing threads. it looks like applyDeletes() is re-scanning an entire index,
even though it's unnecessary as we are only adding documents to the index but
not deleting them
if this optimization was backported, then I will probably have to find a
solution for my problem elsewhere...
> Improve how IndexWriter flushes deletes against existing segments
> -----------------------------------------------------------------
>
> Key: LUCENE-2680
> URL: https://issues.apache.org/jira/browse/LUCENE-2680
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
> LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
> LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
> LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
> LUCENE-2680.patch, LUCENE-2680.patch
>
>
> IndexWriter buffers up all deletes (by Term and Query) and only
> applies them if 1) commit or NRT getReader() is called, or 2) a merge
> is about to kickoff.
> We do this because, for a large index, it's very costly to open a
> SegmentReader for every segment in the index. So we defer as long as
> we can. We do it just before merge so that the merge can eliminate
> the deleted docs.
> But, most merges are small, yet in a big index we apply deletes to all
> of the segments, which is really very wasteful.
> Instead, we should only apply the buffered deletes to the segments
> that are about to be merged, and keep the buffer around for the
> remaining segments.
> I think it's not so hard to do; we'd have to have generations of
> pending deletions, because the newly merged segment doesn't need the
> same buffered deletions applied again. So every time a merge kicks
> off, we pinch off the current set of buffered deletions, open a new
> set (the next generation), and record which segment was created as of
> which generation.
> This should be a very sizable gain for large indices that mix
> deletes, though, less so in flex since opening the terms index is much
> faster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]