When you say "we've tried the whitespace analyzer", did you mean for BOTH
indexing and searching? If you ony use it for one of those, you'd see
results like this.

And do you use Luke? It'll let you examine your index and see what's
*actually* in it. It's the first place I go when I don't get results I
expect....

See: http://www.getopt.org/luke/

What about capitalization? Lucene is case-sensitive. Some of the analyzers
automatically lower-case and some don't.

If you're using the whitespace analyzer, I don't think you need to bother
transforming the hyphen into underscore....

Hope this helps, without more context I'm not sure what else to suggest...

Erick

On 8/7/06, Yiqun Eddie Cao <[EMAIL PROTECTED]> wrote:

Hi,

We are using lucene in a chemistry database, and we are dealing with
special
words containing both digits and characters in English alphabets, such as
PFC-0234. To prevent lucene from cutting the word into two, we have
replaced
all dashes into underscores, so PFC-0234 is stored and indexed as PFC_0234
in the lucene index. However, none of them works for searches containing
wildcard characters. For example, none of the following works: PFC_*,
PFC*,
PF*, PFC_0*, PFC_02*, but PFC-0234 works. Can anyone tell me what is wrong
here? We have tried WhitespaceAnalyzer, but it's not working either.

Thanks,

Eddie


Reply via email to