[ 
https://issues.apache.org/jira/browse/LUCENE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819275#comment-13819275
 ] 

Jack Conradson commented on LUCENE-5336:
----------------------------------------

Thanks for the feedback.

To answer the malformed input question --

If 
"foo bar
is given as the query, the double quote will be dropped, and if whitespace is 
an operator it will make term queries for both 'foo' and 'bar' otherwise it 
will make a single term query 'foo bar'
If
foo"bar
is given as the query, the double quote will be dropped, and term queries will 
be made for both 'foo' and 'bar'

The reason it's done this way is because the parser only backtracks as far as 
the malformed input (in this case the extraneous double quote), so 'foo' would 
already be part of the query tree.  This is because only a single pass is made 
for each query.  The parser could be changed to do two passes to remove 
extraneous characters, but I believe that only makes the code more complex, and 
doesn't necessarily interpret the query any better for a user since the 
malformed character gives no hint as to what he/she really intended to do.

I will try to post another patch today or tomorrow.

I plan to do the following:
* Fix the Javadoc comment
* Add more tests for random operators
* Rename the class to SimpleQueryParser and rename the package to .simple

> Add a simple QueryParser to parse human-entered queries.
> --------------------------------------------------------
>
>                 Key: LUCENE-5336
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5336
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Jack Conradson
>         Attachments: LUCENE-5336.patch
>
>
> I would like to add a new simple QueryParser to Lucene that is designed to 
> parse human-entered queries.  This parser will operate on an entire entered 
> query using a specified single field or a set of weighted fields (using term 
> boost).
> All features/operations in this parser can be enabled or disabled depending 
> on what is necessary for the user.  A default operator may be specified as 
> either 'MUST' representing 'and' or 'SHOULD' representing 'or.'  The 
> features/operations that this parser will include are the following:
> * AND specified as '+'
> * OR specified as '|'
> * NOT specified as '-'
> * PHRASE surrounded by double quotes
> * PREFIX specified as '*'
> * PRECEDENCE surrounded by '(' and ')'
> * WHITESPACE specified as ' ' '\n' '\r' and '\t' will cause the default 
> operator to be used
> * ESCAPE specified as '\' will allow operators to be used in terms
> The key differences between this parser and other existing parsers will be 
> the following:
> * No exceptions will be thrown, and errors in syntax will be ignored.  The 
> parser will do a best-effort interpretation of any query entered.
> * It uses minimal syntax to express queries.  All available operators are 
> single characters or pairs of single characters.
> * The parser is hand-written and in a single Java file making it easy to 
> modify.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to