I'm confused about how to use escape characters in Lucene. My Lucene configuration is
1.3-dev1 and I use the StandardAnalyzer and QueryParser.
My documents have a field called 'path' with a value like "1102/a55407-2002nov2.xml".
This field is indexed but not tokenized. Here are the various queries I've tried and
their results:
1) When a dash is included in the query, Lucene interprets this as a space.
("path:1102/a55402-2002nov2.xml" is interpreted as "path:1102/a55402
-body:2002nov2.xml")
2) When a backslash is inserted before the dash (and the query does *not* contain a
wildcard), Lucene interprets this by inserting a space in lieu of the next character.
('path:1102/a55402\-2002nov2.xml' interpreted as 'path:"1102/a55402 2002nov2.xml"
[note the space where the dash was]')
3) When a backslash is inserted before the dash (and the query *does* contain a
wildcard), Lucene interprets this literally, without any conversion.
("path:1102/55407\-2002nov*" is interpreted literally).
4) When a backslash is inserted before the dash and immediately followed by a
wildcard, Lucene reports an error. ('path:1102/a55407-*' causes lexical error:
Encountered <EOF> after :"")
My overall observation is that it appears it is not possible to escape a dash - is
this true?
A previous post (yesterday) suggests that it is also not possible to escape a
backslash. If that's also true, what characters can be escaped?
Regards,
Terry