Well, perhaps the simplest thing would be to pre-process the query and make the comma into a whitespace before sending anything to the query parser. I don't know how generalizable that sort of solution is in your problem space though....
Best Erick On 6/13/07, Renaud Waldura <[EMAIL PROTECTED]> wrote:
My very simple analyzer produces tokens made of digits and/or letters only. Anything else is discarded. E.g. the input "smith,anna" gets tokenized as 2 tokens, first "smith" then "anna". Say I have indexed documents that contained both "smith,anna" and "smith,annanicole". To find them, I enter the query <<smith,ann*>>. The stock Lucene 2.0 query parser produces a PrefixQuery for the single token "smith,ann". This token doesn't exist in my index, and I don't get a match. I have found some references to this: http://www.nabble.com/Wildcard-query-with-untokenized-punctuation-tf3378386 . html but I don't understand how I can fix it. Comma-separated terms like this can appear in any field; I don't think I can create an untokenized field. Really what I would like in this case is for the comma to be considered whitespace, and the query to be parsed to <<+smith +ann*>>. Any way I can do that? --Renaud