[jira] [Commented] (LUCENE-6494) Make PayloadSpanUtil apply to other postings information

Robert Muir (JIRA) Thu, 21 May 2015 15:26:47 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555162#comment-14555162
 ]


Robert Muir commented on LUCENE-6494:
-------------------------------------

First of all: thank you very much for simplifying the APIs like you did!

And the idea is good, for many reasons. I still think our 
Collection<byte[]>-collecting payload queries should get out of town and be 
moved to sandbox. This is a good step towards allowing some of the crazier 
stuff to be e.g. refactored out of lucene-core and I really like that. And i 
like a different api for the "expert stuff" (payloads and offsets) because it 
still gives us a chance to look at cleanly integrating spans with other queries 
in the future. 

I am just super-worried about timing and stability.

There is a lot to do it seems, when i look at it, can we please mark this issue 
a blocker? Timing is not good here because [~anshumg] is planning to build a 
release.

* TermSpans still has commented out payload handling code.
* In NearSpansOrdered, I don't understand why we are doing various 
calculations, when you are not using the API (The no-op buffered stuff). Why 
not just have a null-check instead of that?
* There should be javadocs (e.g. explaining various integer parameters like 
requiredPostings)
* Should MatchDataIterator and MatchDataCollector really need to be in core 
lucene? (fwiw, i dont know how that PayloadSpanUtil survived so long).
* What is the semantics of Spans.collect(SpanCollector) in terms of lifecycle? 
I mean, i assume it shouldnt be called when doc = NO_MORE_DOCS, or when doc = 
-1, and so forth. And I also assume it shouldnt be called when position = -1 or 
position = NO_MORE_POSITIONS.  Ideally these rules are not just in the javadoc 
but also encoded in the state machine of AssertingSpans.
* I don't like 'requiredPostings' as the way this API is "enabled" in 
createWeight. There are two use-cases, expert payload use-cases, and retrieving 
offsets for e.g. highlighting. But as you see from NearSpansOrdered, these use 
cases can interfere with scoring algorithms. So I think its important that the 
use of this API is explicit, up-front, so it won't hinder scoring algorithms. 
Can we have a boolean instead that is more explicit ("expert proximity mode" or 
whatever we can come up with)? Alternatively a different-named method instead 
of createWeight, like createDebuggingWeight or whatever you want. Basically its 
kinda like explain() to me, except its still a per-segment search and not dog 
slow.


> Make PayloadSpanUtil apply to other postings information
> --------------------------------------------------------
>
>                 Key: LUCENE-6494
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6494
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>             Fix For: 5.2
>
>         Attachments: LUCENE-6494.patch, LUCENE-6494.patch, LUCENE-6494.patch, 
> LUCENE-6494.patch
>
>
> With the addition of SpanCollectors, we can now get arbitrary postings 
> information from SpanQueries.  PayloadSpanUtil does some rewriting to convert 
> non-span queries into SpanQueries so that it can collect payloads.  It would 
> be good to make this more generic, so that we can collect any postings 
> information from any query (without having to make invasive changes to 
> already optimized Scorers, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6494) Make PayloadSpanUtil apply to other postings information

Reply via email to