[
https://issues.apache.org/jira/browse/LUCENE-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128836#comment-15128836
]
Robert Muir commented on LUCENE-7007:
-------------------------------------
Do you think luceneutil is the best dataset for comparison? I think its useful
to see comparisons for "healthy" indexes like that too, but its still a far cry
from structured datasets (more DOCS_ONLY/terms heavy) or abusive cases (e.g.
massive n-gramming) where blocktree might be a performance bottleneck.
> Reduce block-tree GC/CPU cost when flushing or merging postings
> ---------------------------------------------------------------
>
> Key: LUCENE-7007
> URL: https://issues.apache.org/jira/browse/LUCENE-7007
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: LUCENE-7007.patch
>
>
> Writing postings is a GC and CPU heavy operation now, in part because of how
> block tree recursively builds up the tree structure, by creating many
> tiny FSTs which it inefficiently merges together as it walks up the
> tree eventually to the root block.
> So I tried a quick prototype (patch attached) to use a
> less-RAM-efficient, but much fewer tiny FST related objects, when
> writing postings.
> But in some quick indexing performance tests (luceneutil), it makes no
> measurable improvements to indexing performance.
> So I'm putting my patch up here for posterity ... I don't intend to
> commit it unless we can iterate it further. It adds code complexity,
> it's not committable as-is (we need to conditionalize it so it
> sometimes does use FSTs, for segments with many terms), etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]