Re: Wildcard query with untokenized punctuation (again)

Mark Miller Thu, 14 Jun 2007 06:44:26 -0700

Gotto agree with Erick here...best idea is just to preprocess the querybefore sending it to the QueryParser.


My first thought is always to get out the sledgehammer...


- Mark

Erick Erickson wrote:

Well, perhaps the simplest thing would be to pre-process the query and
make the comma into a whitespace before sending anything to the
query parser. I don't know how generalizable that sort of solution is in
your problem space though....

Best
Erick

On 6/13/07, Renaud Waldura <[EMAIL PROTECTED]> wrote:

My very simple analyzer produces tokens made of digits and/or letters
only.
Anything else is discarded. E.g. the input "smith,anna" getstokenized as
2
tokens, first "smith" then "anna".

Say I have indexed documents that contained both "smith,anna" and
"smith,annanicole". To find them, I enter the query <<smith,ann*>>. The
stock Lucene 2.0 query parser produces a PrefixQuery for the singletoken
"smith,ann". This token doesn't exist in my index, and I don't get a
match.

I have found some references to this:
http://www.nabble.com/Wildcard-query-with-untokenized-punctuation-tf3378386
.
html
but I don't understand how I can fix it. Comma-separated terms like this
can
appear in any field; I don't think I can create an untokenized field.

Really what I would like in this case is for the comma to be considered
whitespace, and the query to be parsed to <<+smith +ann*>>. Any way Ican
do
that?

--Renaud


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Wildcard query with untokenized punctuation (again)

Reply via email to