[ 
https://issues.apache.org/jira/browse/LUCENE-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15469582#comment-15469582
 ] 

Trejkaz commented on LUCENE-3371:
---------------------------------

To summarise some investigation I did towards using SpanNotNearQuery, it turns 
out that this doesn't work, but I can't immediately see why.

My rewrite:

{code}
    @Override
    public SpanQuery rewrite(IndexReader reader) throws IOException
    {
        int nearQueriesCount = nearQueries.size();
        SpanQuery[] notNearClauses = new SpanQuery[nearQueriesCount];
        int pre = inOrder ? slop : 0;
        int post = slop;
        for (int i = 0; i < nearQueriesCount; i++)
        {
            notNearClauses[i] = new SpanNotQuery(mainQuery, nearQueries.get(i), 
pre, post);
        }
        return new SpanNotQuery(mainQuery, new SpanOrQuery(notNearClauses));
    }
{code}


i.e., for each query, create a "not near" clause, and then subtract the "not 
near" clauses from the main query clause to get the "near all" result.

This logic is apparently wrong, because this query:

{noformat}
  mainQuery = SpanTerm("content", "a")
  nearQueries = [
    SpanTerm("content", "b"),
    SpanTerm("content", "c")
  ]
  slop = 2,
  inOrder = false
{noformat}

Is expected to match this text:

{noformat}
a x b c x x x a
{noformat}

But instead, it does not match.


> Support for a "SpanAndQuery" / "SpanAllNearQuery"
> -------------------------------------------------
>
>                 Key: LUCENE-3371
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3371
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>            Reporter: Trejkaz
>
> I would like to parse queries like this:
> {noformat}
> a WITHIN 5 WORDS OF (b AND c)
> {noformat}
> This would match cases where both a b span and a c span are within 5 of the 
> same a span.
> The existing span query classes do not appear to be capable of doing this no 
> matter how they are combined, although replacing the AND with "WITHIN 10 OF" 
> (general rule is to double the first number) at least ensures that no hits 
> are lost (it just returns too many.)
> I'm not sure how the class would work, but it might be like this:
> {code}
>   Query q = new SpanAllNearQuery(a, new SpanQuery[] { b, c }, 5, false);
> {code}
> The difference from SpanNearQuery is that SpanNearQuery considers the entire 
> collection of terms as a single set to be found near each other, whereas this 
> query would consider each of the terms in the array relative to the first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to