[jira] [Commented] (LUCENE-7007) Reduce block-tree GC/CPU cost when flushing or merging postings

Robert Muir (JIRA) Tue, 02 Feb 2016 11:22:24 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128836#comment-15128836
 ]


Robert Muir commented on LUCENE-7007:
-------------------------------------

Do you think luceneutil is the best dataset for comparison? I think its useful 
to see comparisons for "healthy" indexes like that too, but its still a far cry 
from structured datasets (more DOCS_ONLY/terms heavy) or abusive cases (e.g. 
massive n-gramming) where blocktree might be a performance bottleneck.

> Reduce block-tree GC/CPU cost when flushing or merging postings
> ---------------------------------------------------------------
>
>                 Key: LUCENE-7007
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7007
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-7007.patch
>
>
> Writing postings is a GC and CPU heavy operation now, in part because of how
> block tree recursively builds up the tree structure, by creating many
> tiny FSTs which it inefficiently merges together as it walks up the
> tree eventually to the root block.
> So I tried a quick prototype (patch attached) to use a
> less-RAM-efficient, but much fewer tiny FST related objects, when
> writing postings.
> But in some quick indexing performance tests (luceneutil), it makes no
> measurable improvements to indexing performance.
> So I'm putting my patch up here for posterity ... I don't intend to
> commit it unless we can iterate it further.  It adds code complexity,
> it's not committable as-is (we need to conditionalize it so it
> sometimes does use FSTs, for segments with many terms), etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7007) Reduce block-tree GC/CPU cost when flushing or merging postings

Reply via email to