Re: Strange behaviour of tokenizer with wildcard queries

Ian Lea Fri, 20 Sep 2013 05:42:39 -0700

It's reasonable that "block-major" won't find anything.
"block-major-57" should match.

The split into block and major-57 will be because, from the javadocs
for ClassicTokenizer, "Splits words at hyphens, unless there's a
number in the token, in which case the whole token is interpreted as a
product number and is not split.".  So I guess it splits on the first
hyphen but not the second.

ClassicAnalyzer/Tokenizer is general purpose and will never meet
everyone's requirement all the time.  You could try a different
analyzer, or build your own.  That's what the javadoc recommends.

--
Ian.

On Fri, Sep 20, 2013 at 1:26 PM, Ramprakash Ramamoorthy
<youngestachie...@gmail.com> wrote:
> Sorry, hit the send button accidentally the last time. Please read below :
>
> Hello,
>
>             We're using lucene 4.1. We have the word "*block-major-57*"
> indexed. Using the classic analyzer, we get the following tokens : *block*and
> *major-57*.
>
>              I search for *block-major*, *the document doesn't match.
> However searching for *block** works perfect. Is this a bug, or am I doing
> something wrong?
>
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> Chennai, India.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Strange behaviour of tokenizer with wildcard queries

Reply via email to