DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27491>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27491 [PATCH] Allowing '-'/'+' in terms Summary: [PATCH] Allowing '-'/'+' in terms Product: Lucene Version: unspecified Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: QueryParser AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I suggest to change the definition of term character in QueryParser.jj from | <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) > to | <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" | "+" ) > As a result query parser will read '-' and '+' within words (such as tft-monitor or Sysh1-1) as one term, which will be tokenized by the used analyzer and end up in a term query or phrase query depending if it create one ore more tokens. So with StandardAnalyzer a query tft-monitor would get a phrase query "tft monitor" and Sysh1-1 a term query for "Sysh1-1". Searching tft-monitor as a phrase "tft monitor" is not exact but the best aproximation possible once you indexed tft-monitor as tokens tft and monitor. Currently query parser interpret every '-' or '+' as operators, which means that 'tft-monitor' gets parsed as tft AND NOT monitor, which probably isn't what the user wanted. The effect of '-'/'+' not occuring within a word is not changed, so tft -monitor will still search for 'tft AND NOT monitor'. All regression tests pass with the change. I didn't add a patch-file, because I think it's easy to change queryParser.jj by hand. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]