[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734241#action_12734241
 ] 

Adriano Crestani commented on LUCENE-1486:
------------------------------------------

Hi Mark H.,

Thanks for the response, some comments inline:

{quote}
Correct, the "inner phrase" example was a term not a phrase. This is perhaps a 
better example:

checkBadQuery("\"jo* \"percival smith\" \""); //phrases inside phrases is bad
{quote}

I think you did not get what I meant, even with your new example, there is no 
inner phrase, it is: a phrase <"jo* ">, followed by a term <percival>, followed 
by another term <smith>, and an empty phrase <" ">. So, with your change, the 
junit passes, but for the wrong reason. It gets an exception complaining about 
the empty phrase and not because there is an inner phrase (I still don't see 
how you can type an inner phrase with the current syntax). I think it's not a 
big deal, but I'm just trying to understand and raise a probable wrong test. I 
expect you understood what I mean, let me know if I did not make it clear.

{quote}
The Junit is currently the main form of documentation
{quote}

But not the ideal, because the source code (junit code) is not released in the 
binary release. So, the ideal place should be in the javadocs.

{quote}

    * Wildcard/fuzzy/range clauses can be used to define a phrase element (as 
opposed to simply single terms)
    * Brackets are used to group/define the acceptable variations for a given 
phrase element e.g. "(john OR jonathon) smith"
    * "AND" is irrelevant - there is effectively an implied "AND_NEXT_TO" 
binding all phrase elements

{quote}

Thanks, now it's clearer for me what is supported or not. I have some questions:

I understand this AND_NEXT_TO implicit operator between the queries inside the 
phrase. However, what happens if the user do not type any explicit boolean 
operator between two terms inside parentheses: "(query parser) lucene". Is the 
operator between 'query' and 'parser' the implicit AND_NEXT_TO or the default 
boolean operator (usually OR)?

What happens if I type "(query AND parser) lucene". In my point of view it is: 
"(query AND parser) AND_NEXT_TO lucene". Which means for me: find any document 
that contains the term 'query' and the term 'parser' in the position x, and the 
term 'lucene' in the position x+1. Is this the expected behaviour?

{quote}
1) Keep in core and improve error reporting and documentation
2) Move into "contrib" as experimental
3) Retain in core but simplify it to support only the simplest syntax (as in my 
Britney~ example)
4) Re-engineer the QueryParser.jj to support a formally defined syntax for 
acceptable "within phrase" operators e.g. *, ~, ( )
{quote}

1 is good, but I would prefer 4 too. Documentation and throw the right 
exception are necessary. I just don't feel confortable on the complex phrase 
query parser relying on the main query parser syntax, any change on the main 
one could easialy brake the complex phrase QP. Anyway, 4 may be done in future 
:)

Mark M.:

{quote}
With the new info from Mark H, how hard would it be to create a new imp for the 
new parser that did a lot of this, in a more defined way? It seems you 
basically just want to be able to use multiterm queries and group/or things, 
right? We could even relax a little if we have to. This hasn't been released, 
so there is still a lot of wiggle room I think. But there does have to be a 
resolution with this and the new parser at some point either way.
{quote}

Yes, I am working on the new query parser code. I started recently to read and 
understand how the ComplexPhraseQP works, so I could reproduce the behaviour 
using the new QP framework. I first tried to look at this QP as a user and 
could not figure out what exactly I can or not do with it. I think now we are 
hitting a big problem, which is related to documentation. That is why I started 
raising these question, because others could also have the same issues in 
future.

So, yes, I can start coding some equivalent QP using the new QP framework, I'm 
just questioning and trying to understand everything before I start any coding. 
I don't wanna code anything that wil throw ConcurrentModificationExceptions, 
that's why I'm raising these issues now, before I start moving it to the new QP.

Best Regards,
Adriano Crestani Campos


> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>               checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>               checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>               checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>               
>               checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>               checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>               checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to