[jira] [Commented] (LUCENE-6371) Improve Spans payload collection

Robert Muir (JIRA) Fri, 29 May 2015 05:09:28 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564678#comment-14564678
 ]


Robert Muir commented on LUCENE-6371:
-------------------------------------

{quote}
I think it's still useful though - I use it all the time!
{quote}

Yeah but its slow with no easy chance of ever being faster. There is no simple 
bitset rewrite here like there is for other multiterm queries. Additionally It 
has all the downsides of an enormous boolean query, but with proximity to boot: 
and this is very real, even simple stuff like 1-2 KB RAM consumption per term 
due to additional decompression buffers for prox.  Maybe in the future you 
could optionally index prefix terms, but I can't imagine merging proximity etc 
into a prefix-field for full-indexed-fields as a default, seems complicated and 
slow and space-consuming.

{quote}
It would be nice if you could restrict the number of SpanOr clauses it rewrites 
to, but that's a separate issue.
{quote}

+1, that is a great idea. We should really both do that and also add warnings 
to the javadocs about inefficiency. It has none today!

{quote}
If you really think that moving .getSpans() and .extractTerms() to SpanWeight 
doesn't gain anything, then I can back it out. But I think it does simplify the 
API and brings it more into line with our other standard queries. 
{quote}

I totally agree it has the value of consistency with other queries. But some of 
the APIs trying to do this are fairly complicated, yet at the same time still 
not really working: see below for more explanation.

{quote}
And I really don't see that exposing the termcontexts map on the SpanWeight 
constructor is any worse than exposing it directly in .getSpans(). In fact, I'd 
say that it's hiding it better - very few users of lucene are going to be 
looking at SpanWeights, as they're an implementation detail, but anyone using 
an IDE is going to be shown SpanQuery.getSpans() when they try and autocomplete 
on a SpanQuery object, and it's not something that most users need to worry 
about.
{quote}

Its actually terrible already: the motivation for this stuff being to try to 
speedup the turtle in question, SpanMultiTermQuery. The reason this stuff was 
exposed, is because it could bring some relief to such crazy queries, by only 
visiting each term in the term dictionary less than 3 times (rewrite, 
weight/idf, postings). But this was never quite right for two reasons:
* Leniency: We can't enforce we are doing the performant thing because creation 
of weight/idf uses extractTerms(). So the SpanTermWeight inside the exclude 
portion of a SpanNot suddenly sees an unexpected term it has no termstate for. 
Maybe patches here removed this problem, but forgot to fix the leniency in 
SpanTermWeight, as I see at least the code comment is gone.
* Incomplete: SpanMultiTermQueryWrapper still isn't reusing the termcontext 
from rewrite(), somehow passing it down to the rewritten-spans. So the whole 
ugly thing isn't even totally working, its just reducing the number of visits 
to the term dictionary from 3 down to 2, but it is stupid that it is not 1.


> Improve Spans payload collection
> --------------------------------
>
>                 Key: LUCENE-6371
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6371
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Assignee: Alan Woodward
>            Priority: Minor
>             Fix For: Trunk, 5.3
>
>         Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch, 
> LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6371) Improve Spans payload collection

Reply via email to