"McGuigan, Colin" <[EMAIL PROTECTED]> wrote on 10/03/2007 11:04:37:
> You're entirely correct about the analyzer (I'm using one that breaks on > non-alphanumeric characters, so all punctuation is ignored). To be > honest, I hadn't thought about altering this, but I guess I could; just > reticent that there might be unforeseen consequences. > > But I'm still curious about the original solution. Shouldn't it be > possible to take "pagefile.*" and tokenize it (essentially throwing away > the wildcard at the end)? I can't see a consistent way for something like this - assume this is done, we might have these steps: query text: aa.bb.cc* remove '*': aa.bb.cc analyze: aa bb cc add the '*' back(?): would it be: aa* bb* cc* or: aa bb cc* Doesn't makes sense to me. The reasoning behind not analyzing wildcard queries is also explained in the FAQ: "Are Wildcard, Prefix, and Fuzzy queries case sensitive?" Regards, Doron > > --Colin McGuigan > > -----Original Message----- > From: Doron Cohen [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 10, 2007 2:08 AM > To: java-user@lucene.apache.org > Subject: Re: Wildcard query with untokenized punctuation > > Hi Colin, > > Is it possible that you are using an analyzer that breaks words on non > letters? For instance SimpleAnalyzer? if so, the doc text: > pagefile.sys > is indexed as two words: > pagefile sys > At search time, the query text: > pagefile.sys > is also parsed-tokenized into a two words query: > profile sys > but the query text: > pagefile.sys* > is not analyzed (by design) and matches only words that start with: > pagefile.sys > But there are no such words in the index, because it was indexed with > breaking words on non-letters... > > Hopefully this gets you started... If this is the reason, you may want > to > use a different analyzer (See Wiki page "AnalysisParalysis"). > > Otherwise, make sure you use the same analyzers at indexing and search > ... > and see the Lucene FAQ entry "Why am I getting no hits / incorrect > hits?". > > If all this still fails, try to post here a simple code snippet showing > how > you index and how you search. > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]