[
https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352218#comment-16352218
]
Adrien Grand commented on LUCENE-4198:
--------------------------------------
This change had an impact on the nightly benchmarks:
- AndHighHigh: -3%
[http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html]
- AndHighMed: -4%
[http://people.apache.org/~mikemccand/lucenebench/AndHighMed.html]
- AndHighOrMedMed: -4%
[http://people.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html]
- AndMedOrHighHigh: -5%
[http://people.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html]
However indexing speed and term queries seem unaffected. And phrase queries
only have a very minor slowdown, which might be noise (~1%). I'll look into it
later, but I think this is an acceptable slowdown given how it allows to speed
up top-k queries.
> Allow codecs to index term impacts
> ----------------------------------
>
> Key: LUCENE-4198
> URL: https://issues.apache.org/jira/browse/LUCENE-4198
> Project: Lucene - Core
> Issue Type: Sub-task
> Components: core/index
> Reporter: Robert Muir
> Priority: Major
> Fix For: master (8.0)
>
> Attachments: LUCENE-4198-BMW.patch, LUCENE-4198.patch,
> LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch,
> LUCENE-4198_flush.patch,
> TestSimpleTextPostingsFormat.asf.nightly.master.1466.consoleText.excerpt.txt,
> TestSimpleTextPostingsFormat.sarowe.jenkins.nightly.master.681.consoleText.excerpt.txt
>
>
> Subtask of LUCENE-4100.
> Thats an example of something similar to impact indexing (though, his
> implementation currently stores a max for the entire term, the problem is the
> same).
> We can imagine other similar algorithms too: I think the codec API should be
> able to support these.
> Currently it really doesnt: Stefan worked around the problem by providing a
> tool to 'rewrite' your index, he passes the IndexReader and Similarity to it.
> But it would be better if we fixed the codec API.
> One problem is that the Postings writer needs to have access to the
> Similarity. Another problem is that it needs access to the term and
> collection statistics up front, rather than after the fact.
> This might have some cost (hopefully minimal), so I'm thinking to experiment
> in a branch with these changes and see if we can make it work well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]