[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734148#action_12734148 ]
Mark Harwood commented on LUCENE-1486: -------------------------------------- I'll try and catch up with some of the issues raised here: bq. What do you mean on the last check by phrase inside phrase, I don't see any phrase inside a phrase Correct, the "inner phrase" example was a term not a phrase. This is perhaps a better example: checkBadQuery("\"jo* \"percival smith\" \""); //phrases inside phrases is bad bq. I'm trying now to figure out what is supported The Junit is currently the main form of documentation - unlike the XMLQueryParser (which has a DTD) there is no syntax to formally capture the logic. Here is a basic summary of the syntax supported and how it differs from normal non-phrase use of the same operators: * Wildcard/fuzzy/range clauses can be used to define a phrase element (as opposed to simply single terms) * Brackets are used to group/define the acceptable variations for a given phrase element e.g. "(john OR jonathon) smith" * "AND" is irrelevant - there is effectively an implied "AND_NEXT_TO" binding all phrase elements To move this forward I would suggest we consider following one of these options: 1) Keep in core and improve error reporting and documentation 2) Move into "contrib" as experimental 3) Retain in core but simplify it to support only the simplest syntax (as in my Britney~ example) 4) Re-engineer the QueryParser.jj to support a formally defined syntax for acceptable "within phrase" operators e.g. *, ~, ( ) I think 1) is achievable if we carefully define where the existing parser breaks (e.g. ANDs and nested brackets) 2) is unnecessary if we can achieve 1). 3) would be a shame if we lost useful features for some very convoluted edge cases 4) is beyond my JavaCC skills. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, > junit_complex_phrase_qp_07_21_2009.patch, > junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, > LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, > TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of > PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in > QueryParser itself. This works as a proof of concept for much of the query > parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies > are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic > works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic > works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a > phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases > is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries > inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org