Well, you've proved that your problem is that the analyzer you're using when querying isn't matching what you use during indexing. I think that what you've done will lead you into significant problems down the road as your tokenizer then has to "correct" for what the index analyzer did though.
What would probably be MUCH less work in the long run is to align the analyzer you use at query time with the analyzer you use at index time. You can use a PerFieldAnalyzerWrapper to handle different fields in different ways. Forget your custom tokenizer for the time being, just try using the same analyzer during searching that you used during indexing. You can use the *QueryParser<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/queryParser/QueryParser.html#QueryParser%28java.lang.String,%20org.apache.lucene.analysis.Analyzer%29> *(String <http://java.sun.com/j2se/1.4/docs/api/java/lang/String.html> f, Analyzer<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/analysis/Analyzer.html> a) form of the QueryParser, where the Analyzer is the same one you used when indexing. There are some circumstances where you want to use different analyzers when querying and when indexing, but don't go there unless you need to <G>.... If that doesn't do what you want, I'd really recommend is that you make your own custom Analyzer, built on, say, WhitespaceTokenizer, LowerCaseFilter. This is usually the way I've approached this kind of problem. And use *that* one at index and query time. There's an example in Lucene In Action, see the SynonymAnalyzer example. That example is MUCH more complex than you'll need <G>... Best Erick On 2/8/07, Xavier To <[EMAIL PROTECTED]> wrote:
Hey ! I tried using WhitespaceAnalyzer during the search and it works. I refactored the tokenizing process so it uses TokenStream instead of StringTokenizer and it works fine for one thing : the query "this is a test" becomes "thisisatest". I fixed it by adding a space after each token except for the last one, but is there a clean way to do it ? I'm using WhitespaceTokenizer. Thanks a bunch ! Xavier Tô Bacc. en Informatique et Génie Logiciel [EMAIL PROTECTED] (450)434-8905 ----- Message d'origine ----- De: Erick Erickson <[EMAIL PROTECTED]> Date: Mercredi, Février 7, 2007 4:28 pm Objet: Re: Re : Re: Question concerning Analyzers > Then the analyzer you're using when parsing the query is stripping > them. It > must be different than the one you use when indexing somehow. At least > that's the only explanation I can imagine.... > > Perhaps, somehow, you are using a default analyzer when you parse a > query?Or you aren't specifying the field when you query and thus a > default is > used? Or you are using a PerFieldAnalyzerWrapper and dropping > through to the > default? or ???? > > Just for yucks, I'd try using WhitespaceAnalyzer on a query with > somethingyou *know* exists in the index for a particular field and > work my way up to > whatever your real problem is in small steps (since you can't post > code<G>)...... > > Best > Erick > > On 2/7/07, Xavier To <[EMAIL PROTECTED]> wrote: > > > > Thanks Erik and Erick, > > > > I guess my question was rather unclear, but you guys answered it > all the > > same : it is impossible for an analyzer to index something and > having the > > same analyzer ignore the thing indexed during a search. > > > > If it makes everything clearer, during indexation, numbers are > indexed,> whether or not they are accompanied by letters ( 2003 and > 4wd are both > > indexed). That's fine, since we want this. The problem occurs > when I try to > > search for them : They are ignored. I know they are indexed > because I ran > > through the index using Luke. > > > > Any thoughts regarding this problem ? > > > > Xavier Tô > > Bacc. en Informatique et Génie Logiciel > > [EMAIL PROTECTED] > > (450)434-8905 > > > > ----- Message d'origine ----- > > De: Erik Hatcher <[EMAIL PROTECTED]> > > Date: Mercredi, Février 7, 2007 3:15 pm > > Objet: Re: Question concerning Analyzers > > > > > There is no requirement that you use the same analyzer to > search as > > > > > > you used to index. So, yes, you could certainly index things and > > > ignore them during a search. > > > > > > Erik > > > > > > > > > On Feb 7, 2007, at 2:10 PM, Xavier To wrote: > > > > > > > Hi, me again > > > > > > > > I'm still stuck with my search engine, but something popped > in my > > > > > > > head : Can an analyzer index something but ignore it during a > > > > search ? I'm asking this because now that I've been searching > for> > > > > > an answer, I've come to think that I should redo the whole > search> > > > > > engine, but I don't want to reproduce the same error as we have > > > > now. It would be stupid to accidentaly redo the same mistake. I > > > > still haven't received news from my seniors about me posting > code> > > > > > and all... > > > > > > > > Xavier Tô > > > > Bacc. en Informatique et Génie Logiciel > > > > [EMAIL PROTECTED] > > > > (450)434-8905 > > > > > > > > > > > > > > > > -------------------------------------------------------------- > ---- > > > --- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: java-user- > [EMAIL PROTECTED]> > > > > > > > ---------------------------------------------------------------- > ---- > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > ------------------------------------------------------------------ > --- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]