Thanks a lot for all your help. I guess this temporary fix will have to do until I have clearance to post some code. For the current index (that was last modified over a year ago), it works fine, but I know it's not properly done.
Thank you all very much, especially you Mr Erickson. Xavier Tô Bacc. en Informatique et Génie Logiciel [EMAIL PROTECTED] (450)434-8905 ----- Message d'origine ----- De: Erick Erickson <[EMAIL PROTECTED]> Date: Vendredi, Février 9, 2007 12:38 pm Objet: Re: Re : Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers > The query should be tokenized *by the query parser*. You shouldn't > have to > do the tokenizing yourself. When you print out the results of the > parsing,you should see something like field:value1 field:value2, > which are built up > under the covers to be a BooleanQuery with a bunch of clauses. > > I think, though, I'm really at the end of any helpful suggestions I > can come > up with without looking at some code from both the indexing and > querying.Otherwise, we'll just continue to mislead each other. If > you haven't > already, I strongly urge you to get a copy of Lucene In Action > since that'll > give you a much more thorough explication of tokenizing than I can. > > Best > Erick > > On 2/9/07, Xavier To <[EMAIL PROTECTED]> wrote: > > > > Hey, thanks a lot for taking so much time here... > > > > I did check the and they appear to be the same...at least they > are same > > class and same package. I just noticed something : they are using > > LowerCaseFilter.... I was going to say "could it be the source of > the> numbers being ignored ?" but it shouldn't since they are > indexed (the > > modification of using WhitespaceAnalyzer during the search did > return the > > exact number of results for "2002" which is 5. > > > > As for the tokenizing, shouldn't a query be tokenized ? It was > already> like that, and all I did was modify it so it would use > Lucene's tokenizing > > methods... If a query shouldn't be tokenized, maybe tokenizing it > is the > > problem. If it should be tokenized, what am I doing wrong that > forces me to > > add a single blank after each token ? I mean, I don't understand > what the > > analyzer has to do with the tokenizing process... The reason why > I add a > > blank is because the tokens are getting appended into a string, > and then the > > string is sent through QueryParser. > > > > As I said, I don't really understand why the guy who made this > search> engine didn't just sent the query as a long string instead > of tokenizing it, > > but since it was working fine with alphabetical searches, I said > to myself > > "it must be the way to do it". > > > > Xavier Tô > > Bacc. en Informatique et Génie Logiciel > > [EMAIL PROTECTED] > > (450)434-8905 > > > > ----- Message d'origine ----- > > De: Erick Erickson <[EMAIL PROTECTED]> > > Date: Jeudi, Février 8, 2007 5:13 pm > > Objet: Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers > > > > > See below.... > > > > > > On 2/8/07, Xavier To <[EMAIL PROTECTED]> wrote: > > > > > > > > Thanks for helping me. > > > > > > > > I don't really understand what you mean by my Tokenizer > > > "corrects" what > > > > the indexing analyzer did. > > > > > > > > > You shouldn't have to do change the tokens in the usual case to > get> > thesearch to work right. You mentioned tokenizing the search > > > string, but then > > > having to add whitespaces back in. That step is the step that > > > "corrects"what the analyzer did. I put "corrects" in quotes > because> > it isn't really > > > correcting anything, the analyzers are doing what they should. But > > > if you > > > have to make this manual change, you're trying to fix up the query > > > string to > > > match what the analyzer did at index time. Which will leave you > > > correctingthis, then that, then the other thing when it would be > > > much better just to > > > use the same analyzer if possible. I've just seen too many "oh, > > > there's one > > > more thing" statements in this situation. > > > > > > > > > By the way, the tokenizer we use is one provided in Lucene. My > > > guess is that > > > > the problem was that the analyzer was thought to be the same by > > > the guy who > > > > made the search engine, but the querying analyzer is fetched > > > inside a JAR by > > > > a bean. Could it be that this is the problem ? > > > > > > > > > It shouldn't be if the same analyzer is fetched inside the bean. > > > Can't you > > > check what analyzer is used in both cases? > > > > > > Erick > > > > > > > > > Xavier Tô > > > > Bacc. en Informatique et Génie Logiciel > > > > [EMAIL PROTECTED] > > > > (450)434-8905 > > > > > > > > ----- Message d'origine ----- > > > > De: Erick Erickson <[EMAIL PROTECTED]> > > > > Date: Jeudi, Février 8, 2007 12:51 pm > > > > Objet: Re: Re : Re: Re : Re: Question concerning Analyzers > > > > > > > > > Well, you've proved that your problem is that the analyzer > you're> > > > using when > > > > > querying isn't matching what you use during indexing. I > think that > > > > > whatyou've done will lead you into significant problems > down the > > > > > road as your > > > > > tokenizer then has to "correct" for what the index analyzer > did> > > > though. > > > > > What would probably be MUCH less work in the long run is to > > > align the > > > > > analyzer you use at query time with the analyzer you use at > index> > > > time. You > > > > > can use a PerFieldAnalyzerWrapper to handle different > fields in > > > > > differentways. Forget your custom tokenizer for the time > being,> > > > just try using the > > > > > same analyzer during searching that you used during > indexing. You > > > > > can use > > > > > the > > > > > *QueryParser<file:///C:/lucene- > > > > > > > > > 2.0.0/docs/api/org/apache/lucene/queryParser/QueryParser.html#QueryParser%28java.lang.String,%20org.apache.lucene.analysis.Analyzer%29>*(String>> > <http://java.sun.com/j2se/1.4/docs/api/java/lang/String.html> f, > > > > > Analyzer<file:///C:/lucene- > > > > > 2.0.0/docs/api/org/apache/lucene/analysis/Analyzer.html> a) > > > > > > > > > > form of the QueryParser, where the Analyzer is the same one > you> > > > used when > > > > > indexing. There are some circumstances where you want to use > > > different> > analyzers when querying and when indexing, but don't > > > go there > > > > > unless you > > > > > need to <G>.... > > > > > > > > > > If that doesn't do what you want, I'd really recommend is > that you > > > > > make your > > > > > own custom Analyzer, built on, say, WhitespaceTokenizer, > > > > > LowerCaseFilter.This is usually the way I've approached > this kind > > > > > of problem. And use *that* > > > > > one at index and query time. > > > > > > > > > > There's an example in Lucene In Action, see the > SynonymAnalyzer> > > > example.That example is MUCH more complex > than you'll need <G>... > > > > > > > > > > Best > > > > > Erick > > > > > > > > > > On 2/8/07, Xavier To <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > Hey ! > > > > > > > > > > > > I tried using WhitespaceAnalyzer during the search and it > > > works. I > > > > > > refactored the tokenizing process so it uses TokenStream > > > instead of > > > > > > StringTokenizer and it works fine for one thing : the query > > > "this> > is a test" > > > > > > becomes "thisisatest". I fixed it by adding a space after > each> > > > token except > > > > > > for the last one, but is there a clean way to do it ? I'm > using> > > > > WhitespaceTokenizer. > > > > > > > > > > > > Thanks a bunch ! > > > > > > > > > > > > Xavier Tô > > > > > > Bacc. en Informatique et Génie Logiciel > > > > > > [EMAIL PROTECTED] > > > > > > (450)434-8905 > > > > > > > > > > > > ----- Message d'origine ----- > > > > > > De: Erick Erickson <[EMAIL PROTECTED]> > > > > > > Date: Mercredi, Février 7, 2007 4:28 pm > > > > > > Objet: Re: Re : Re: Question concerning Analyzers > > > > > > > > > > > > > Then the analyzer you're using when parsing the query is > > > stripping> > > > them. It > > > > > > > must be different than the one you use when indexing > somehow.> > > > At least > > > > > > > that's the only explanation I can imagine.... > > > > > > > > > > > > > > Perhaps, somehow, you are using a default analyzer when > you> > > > parse a > > > > > > > query?Or you aren't specifying the field when you query > and> > > > thus a > > > > > > > default is > > > > > > > used? Or you are using a PerFieldAnalyzerWrapper and > dropping> > > > > > through to the > > > > > > > default? or ???? > > > > > > > > > > > > > > Just for yucks, I'd try using WhitespaceAnalyzer on a > query> > with> > > > somethingyou *know* exists in the index for a > > > particular field and > > > > > > > work my way up to > > > > > > > whatever your real problem is in small steps (since you > > > can't post > > > > > > > code<G>)...... > > > > > > > > > > > > > > Best > > > > > > > Erick > > > > > > > > > > > > > > On 2/7/07, Xavier To <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > Thanks Erik and Erick, > > > > > > > > > > > > > > > > I guess my question was rather unclear, but you guys > > > answered it > > > > > > > all the > > > > > > > > same : it is impossible for an analyzer to index > > > something and > > > > > > > having the > > > > > > > > same analyzer ignore the thing indexed during a search. > > > > > > > > > > > > > > > > If it makes everything clearer, during indexation, > > > numbers are > > > > > > > indexed,> whether or not they are accompanied by > letters ( > > > 2003> > and> > 4wd are both > > > > > > > > indexed). That's fine, since we want this. The problem > > > occurs> > > > when I try to > > > > > > > > search for them : They are ignored. I know they are > indexed> > > > > > because I ran > > > > > > > > through the index using Luke. > > > > > > > > > > > > > > > > Any thoughts regarding this problem ? > > > > > > > > > > > > > > > > Xavier Tô > > > > > > > > Bacc. en Informatique et Génie Logiciel > > > > > > > > [EMAIL PROTECTED] > > > > > > > > (450)434-8905 > > > > > > > > > > > > > > > > ----- Message d'origine ----- > > > > > > > > De: Erik Hatcher <[EMAIL PROTECTED]> > > > > > > > > Date: Mercredi, Février 7, 2007 3:15 pm > > > > > > > > Objet: Re: Question concerning Analyzers > > > > > > > > > > > > > > > > > There is no requirement that you use the same > analyzer to > > > > > > > search as > > > > > > > > > > > > > > > > > > you used to index. So, yes, you could certainly index > > > > > things and > > > > > > > > > ignore them during a search. > > > > > > > > > > > > > > > > > > Erik > > > > > > > > > > > > > > > > > > > > > > > > > > > On Feb 7, 2007, at 2:10 PM, Xavier To wrote: > > > > > > > > > > > > > > > > > > > Hi, me again > > > > > > > > > > > > > > > > > > > > I'm still stuck with my search engine, but something > > > popped> > > > in my > > > > > > > > > > > > > > > > > > > head : Can an analyzer index something but ignore it > > > > > during a > > > > > > > > > > search ? I'm asking this because now that I've been > > > > > searching> > for> > > > > > > > > > > > an answer, I've come to think that I should redo the > > > whole> > > > search> > > > > > > > > > > > engine, but I don't want to reproduce the same > error as > > > > > we have > > > > > > > > > > now. It would be stupid to accidentaly redo the same > > > > > mistake. I > > > > > > > > > > still haven't received news from my seniors about me > > > posting> > > > code> > > > > > > > > > > > and all... > > > > > > > > > > > > > > > > > > > > Xavier Tô > > > > > > > > > > Bacc. en Informatique et Génie Logiciel > > > > > > > > > > [EMAIL PROTECTED] > > > > > > > > > > (450)434-8905 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------- > ---- > > > ---- > > > > > ---- > > > > > > > ---- > > > > > > > > > --- > > > > > > > > > > To unsubscribe, e-mail: java-user- > > > > > [EMAIL PROTECTED]> > > > > For additional > commands,> > e- > > > > > mail: java-user- > > > > > > > [EMAIL PROTECTED]> > > > > > > > > > > > > > > > > > > > ---------------------------------------------------- > ---- > > > ---- > > > > > ---- > > > > > > > ---- > > > > > > > > > - > > > > > > > > > To unsubscribe, e-mail: java-user- > > > > > [EMAIL PROTECTED]> > > > For additional > commands, e- > > > > > mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------ > ---- > > > ---- > > > > > ---- > > > > > > > --- > > > > > > > > To unsubscribe, e-mail: java-user- > > > [EMAIL PROTECTED]> > > > > For additional commands, > e- > > > mail: java-user- > > > > > [EMAIL PROTECTED]> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------- > ---- > > > ---- > > > > > --- > > > > > > To unsubscribe, e-mail: java-user- > [EMAIL PROTECTED]> > > > > For additional commands, e- > mail: java-user- > > > [EMAIL PROTECTED]> > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > ---- > > > --- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: java-user- > [EMAIL PROTECTED]> > > > > > > > > > > > > > > > ------------------------------------------------------------------ > --- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]