[ 
https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4198:
---------------------------------
    Attachment: LUCENE-4198.patch

OK, new iteration. I integrated LUCENE-8116, started to fix corner-cases and 
I've been looking into ways to make the API nicer. Current take is to add 
{{PostingsEnum.setMinCompetitiveScore}} which defaults to a no-op, and 
{{TermsEnum.topPostings(SimScorer)}} which returns a postings that should be 
able to skip low-scoring documents and delegates to {{TermsEnum.postings(null, 
PostingsEnum.FREQS)}} by default.

I still need to work on tests and stop creating a new IndexInput slice for 
every term at index-time. I suppose I could implement getMergeInstance on 
{{Lucene70NormsProducer}} to reuse the same slice across invocations to 
getNorms on the same field.

I'll keep working on this in the next days.

> Allow codecs to index term impacts
> ----------------------------------
>
>                 Key: LUCENE-4198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4198
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: core/index
>            Reporter: Robert Muir
>         Attachments: LUCENE-4198.patch, LUCENE-4198.patch, 
> LUCENE-4198_flush.patch
>
>
> Subtask of LUCENE-4100.
> Thats an example of something similar to impact indexing (though, his 
> implementation currently stores a max for the entire term, the problem is the 
> same).
> We can imagine other similar algorithms too: I think the codec API should be 
> able to support these.
> Currently it really doesnt: Stefan worked around the problem by providing a 
> tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. 
> But it would be better if we fixed the codec API.
> One problem is that the Postings writer needs to have access to the 
> Similarity. Another problem is that it needs access to the term and 
> collection statistics up front, rather than after the fact.
> This might have some cost (hopefully minimal), so I'm thinking to experiment 
> in a branch with these changes and see if we can make it work well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to