[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734398#action_12734398
 ] 

Adriano Crestani commented on LUCENE-1486:
------------------------------------------

{quote}
I propose doing this using using the new QP implementation. (I can write the 
new javacc QP for this)
(this implies that the code will be in contrib in 2.9 and be part of core on 
3.0)
{quote}

That would be good!

{quote}
Granted, the test fails for a reason other than the one for which I wanted it 
to fail.
We can probably strike the test and leave a note saying phrase-within-a-phrase 
just does not make sense and is not supported.
{quote}

Cool, I agree to remove it. But I still don't see how an user can type a phrase 
inside a phrase with the current syntax definition, can you give me an example?

{quote}
In brackets it's an OR - the brackets are used to suggest that the current 
phrase element at position X is composed of some choices that are evaluated as 
a subclause in the same way that in normal query logic sub-clauses are defined 
in brackets e.g. +a +(b OR c). There seems to be a reasonable logic to this.

Ideally the ComplexPhraseQueryParser should explicitly turn this setting on 
while evaluating the bracketed innards of phrases just in case the base class 
has AND as the default.
{quote}

If we use the implemented java cc code Luis suggested, we would have already a 
query parser that throws ParseExceptions whenever the user types an AND inside 
a phrase.

{quote}
OR,||,+, AND, && ..... ignored
{quote}

So we should throw an excpetion if any of these is found inside a phrase. It 
could confuse the user if we just ignore it.

{quote}
    Question 2)
    Should these 2 queries behave the same when we fix the problem
    // checkMatches("\"john -percival\"", "1"); // not logic doesn't work
    // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work

I suppose there's an open question as to if the second example is legal (the 
brackets are unnecessary)
{quote}

Yes, the second is unnecessary, but I don't think it's illegal. The user could 
type <(smith)> outside the phrase, it makes sense to support it inside also.

{quote}
    Question 3)
    checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works.
    doc 6 is also returned, so this feature does not seem to be working.

That looks like a bug related to slop factor?
{quote}

I have not checked yet, but I think it's working fine. The slop means how many 
switches between the terms inside the phrase is allowed to match the query. It 
matches doc 6, because the term <smith> switches twice to the right and matched 
"johathon mary gomes smith". Twice = slop 2 :)

{quote}
ANDs are ignored and turned into ORs (see earlier comments) but maybe a query 
parse error should be thrown to emphasise this.
{quote}

I think we could support AND also. I agree there are few cases where the user 
would use that. It would work as I explained before:

{quote}
What happens if I type "(query AND parser) lucene". In my point of view it is: 
"(query AND parser) AND_NEXT_TO lucene". Which means for me: find any document 
that contains the term 'query' and the term 'parser' in the position x, and the 
term 'lucene' in the position x+1. Is this the expected behaviour?
{quote}


> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>               checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>               checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>               checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>               
>               checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>               checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>               checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to