"McGuigan, Colin" <[EMAIL PROTECTED]> wrote on 10/03/2007 11:04:37:

> You're entirely correct about the analyzer (I'm using one that breaks on
> non-alphanumeric characters, so all punctuation is ignored).  To be
> honest, I hadn't thought about altering this, but I guess I could; just
> reticent that there might be unforeseen consequences.
>
> But I'm still curious about the original solution.  Shouldn't it be
> possible to take "pagefile.*" and tokenize it (essentially throwing away
> the wildcard at the end)?

I can't see a consistent way for something like this - assume this is done,
we might have these steps:
 query text:  aa.bb.cc*
 remove '*':  aa.bb.cc
 analyze:     aa bb cc
 add the '*' back(?):
   would it be: aa* bb* cc*
   or: aa bb cc*
Doesn't makes sense to me.

The reasoning behind not analyzing wildcard queries is also explained in
the FAQ: "Are Wildcard, Prefix, and Fuzzy queries case sensitive?"

Regards,
Doron

>
> --Colin McGuigan
>
> -----Original Message-----
> From: Doron Cohen [mailto:[EMAIL PROTECTED]
> Sent: Saturday, March 10, 2007 2:08 AM
> To: java-user@lucene.apache.org
> Subject: Re: Wildcard query with untokenized punctuation
>
> Hi Colin,
>
> Is it possible that you are using an analyzer that breaks words on non
> letters? For instance SimpleAnalyzer? if so, the doc text:
>    pagefile.sys
> is indexed as two words:
>   pagefile sys
> At search time, the query text:
>   pagefile.sys
> is also parsed-tokenized into a two words query:
>   profile sys
> but the query text:
>   pagefile.sys*
> is not analyzed (by design) and matches only words that start with:
>   pagefile.sys
> But there are no such words in the index, because it was indexed with
> breaking words on non-letters...
>
> Hopefully this gets you started... If this is the reason, you may want
> to
> use a different analyzer (See Wiki page "AnalysisParalysis").
>
> Otherwise, make sure you use the same analyzers at indexing and search
> ...
> and see the Lucene FAQ entry "Why am I getting no hits / incorrect
> hits?".
>
> If all this still fails, try to post here a simple code snippet showing
> how
> you index and how you search.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to