[ https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352218#comment-16352218 ]
Adrien Grand commented on LUCENE-4198: -------------------------------------- This change had an impact on the nightly benchmarks: - AndHighHigh: -3% [http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html] - AndHighMed: -4% [http://people.apache.org/~mikemccand/lucenebench/AndHighMed.html] - AndHighOrMedMed: -4% [http://people.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html] - AndMedOrHighHigh: -5% [http://people.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html] However indexing speed and term queries seem unaffected. And phrase queries only have a very minor slowdown, which might be noise (~1%). I'll look into it later, but I think this is an acceptable slowdown given how it allows to speed up top-k queries. > Allow codecs to index term impacts > ---------------------------------- > > Key: LUCENE-4198 > URL: https://issues.apache.org/jira/browse/LUCENE-4198 > Project: Lucene - Core > Issue Type: Sub-task > Components: core/index > Reporter: Robert Muir > Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-4198-BMW.patch, LUCENE-4198.patch, > LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, > LUCENE-4198_flush.patch, > TestSimpleTextPostingsFormat.asf.nightly.master.1466.consoleText.excerpt.txt, > TestSimpleTextPostingsFormat.sarowe.jenkins.nightly.master.681.consoleText.excerpt.txt > > > Subtask of LUCENE-4100. > Thats an example of something similar to impact indexing (though, his > implementation currently stores a max for the entire term, the problem is the > same). > We can imagine other similar algorithms too: I think the codec API should be > able to support these. > Currently it really doesnt: Stefan worked around the problem by providing a > tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. > But it would be better if we fixed the codec API. > One problem is that the Postings writer needs to have access to the > Similarity. Another problem is that it needs access to the term and > collection statistics up front, rather than after the fact. > This might have some cost (hopefully minimal), so I'm thinking to experiment > in a branch with these changes and see if we can make it work well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org