[jira] [Commented] (LUCENE-4198) Allow codecs to index term impacts

Robert Muir (JIRA) Sun, 28 Jan 2018 19:52:15 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342865#comment-16342865
 ]


Robert Muir commented on LUCENE-4198:
-------------------------------------

Sorry, took me a long time (been traveling). I think the work is impressive and 
clean, but just have a few thoughts, maybe for the future: it would be nice if 
we didn't have to pass SimScorer down to this low level.

I guess I am suggesting we could explore an even lower-level API for 
impactsenum where the consumer (likely gonna be org.apache.lucene.search) is 
the only one aware of the scoring function, so the codec api is really exposing 
raw data instead. I feel like this would map better to how the other codec apis 
work, give a bit better separation?

That being said, I think its more important to make progress for now... the api 
is labelled experimental so we could improve it in the future. Its also not 
stupid-complicated or anything, just two methods! Also admittedly, I don't have 
any real solid use cases for the "rawer" api besides CheckIndex, maybe 
searchAfter..., its just more of a stretch idea.


> Allow codecs to index term impacts
> ----------------------------------
>
>                 Key: LUCENE-4198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4198
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: core/index
>            Reporter: Robert Muir
>            Priority: Major
>         Attachments: LUCENE-4198-BMW.patch, LUCENE-4198.patch, 
> LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, 
> LUCENE-4198_flush.patch
>
>
> Subtask of LUCENE-4100.
> Thats an example of something similar to impact indexing (though, his 
> implementation currently stores a max for the entire term, the problem is the 
> same).
> We can imagine other similar algorithms too: I think the codec API should be 
> able to support these.
> Currently it really doesnt: Stefan worked around the problem by providing a 
> tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. 
> But it would be better if we fixed the codec API.
> One problem is that the Postings writer needs to have access to the 
> Similarity. Another problem is that it needs access to the term and 
> collection statistics up front, rather than after the fact.
> This might have some cost (hopefully minimal), so I'm thinking to experiment 
> in a branch with these changes and see if we can make it work well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4198) Allow codecs to index term impacts

Reply via email to