DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27491>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27491

[PATCH] Allowing '-'/'+' in terms

           Summary: [PATCH] Allowing '-'/'+' in terms
           Product: Lucene
           Version: unspecified
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: QueryParser
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


I suggest to change the definition of term character in QueryParser.jj
from 
| <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) >
to
| <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" | "+" ) >

As a result query parser will read '-' and '+' within words (such as tft-monitor
or Sysh1-1) as one term, which will be tokenized by the used analyzer
and end up in a term query or phrase query depending if it create one ore
more tokens.
So with StandardAnalyzer a query tft-monitor would get a phrase query "tft
monitor" and Sysh1-1 a term query for "Sysh1-1". 
Searching tft-monitor as a phrase "tft monitor" is not exact but the best
aproximation possible once you indexed tft-monitor as tokens tft and monitor.
Currently query parser interpret every '-' or '+' as operators, which means
that 'tft-monitor' gets parsed as tft AND NOT monitor, which probably isn't what
 the user wanted.
The effect of '-'/'+' not occuring within a word is not changed, so
tft -monitor will still search for 'tft AND NOT monitor'.

All regression tests pass with the change.

I didn't add a patch-file, because I think it's easy to change queryParser.jj by
hand.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to