[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734398#action_12734398 ]
Adriano Crestani commented on LUCENE-1486: ------------------------------------------ {quote} I propose doing this using using the new QP implementation. (I can write the new javacc QP for this) (this implies that the code will be in contrib in 2.9 and be part of core on 3.0) {quote} That would be good! {quote} Granted, the test fails for a reason other than the one for which I wanted it to fail. We can probably strike the test and leave a note saying phrase-within-a-phrase just does not make sense and is not supported. {quote} Cool, I agree to remove it. But I still don't see how an user can type a phrase inside a phrase with the current syntax definition, can you give me an example? {quote} In brackets it's an OR - the brackets are used to suggest that the current phrase element at position X is composed of some choices that are evaluated as a subclause in the same way that in normal query logic sub-clauses are defined in brackets e.g. +a +(b OR c). There seems to be a reasonable logic to this. Ideally the ComplexPhraseQueryParser should explicitly turn this setting on while evaluating the bracketed innards of phrases just in case the base class has AND as the default. {quote} If we use the implemented java cc code Luis suggested, we would have already a query parser that throws ParseExceptions whenever the user types an AND inside a phrase. {quote} OR,||,+, AND, && ..... ignored {quote} So we should throw an excpetion if any of these is found inside a phrase. It could confuse the user if we just ignore it. {quote} Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work I suppose there's an open question as to if the second example is legal (the brackets are unnecessary) {quote} Yes, the second is unnecessary, but I don't think it's illegal. The user could type <(smith)> outside the phrase, it makes sense to support it inside also. {quote} Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. That looks like a bug related to slop factor? {quote} I have not checked yet, but I think it's working fine. The slop means how many switches between the terms inside the phrase is allowed to match the query. It matches doc 6, because the term <smith> switches twice to the right and matched "johathon mary gomes smith". Twice = slop 2 :) {quote} ANDs are ignored and turned into ORs (see earlier comments) but maybe a query parse error should be thrown to emphasise this. {quote} I think we could support AND also. I agree there are few cases where the user would use that. It would work as I explained before: {quote} What happens if I type "(query AND parser) lucene". In my point of view it is: "(query AND parser) AND_NEXT_TO lucene". Which means for me: find any document that contains the term 'query' and the term 'parser' in the position x, and the term 'lucene' in the position x+1. Is this the expected behaviour? {quote} > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, > junit_complex_phrase_qp_07_21_2009.patch, > junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, > LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, > TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of > PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in > QueryParser itself. This works as a proof of concept for much of the query > parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies > are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic > works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic > works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a > phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases > is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries > inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org