Re : Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers

Xavier To Fri, 09 Feb 2007 05:46:28 -0800

Hey, thanks a lot for taking so much time here...

I did check the and they appear to be the same...at least they are same class 
and same package. I just noticed something : they are using LowerCaseFilter.... 
I was going to say "could it be the source of the numbers being ignored ?" but 
it shouldn't since they are indexed (the modification of using 
WhitespaceAnalyzer during the search did return the exact number of results for 
"2002" which is 5.


As for the tokenizing, shouldn't a query be tokenized ? It was already like 
that, and all I did was modify it so it would use Lucene's tokenizing 
methods... If a query shouldn't be tokenized, maybe tokenizing it is the 
problem. If it should be tokenized,  what am I doing wrong that forces me to 
add a single blank after each token ? I mean, I don't understand what the 
analyzer has to do with the tokenizing process... The reason why I add a blank 
is because the tokens are getting appended into a string, and then the string 
is sent through QueryParser. 

As I said, I don't really understand why the guy who made this search engine 
didn't just sent the query as a long string instead of tokenizing it, but since 
it was working fine with alphabetical searches, I said to myself "it must be 
the way to do it".

Xavier Tô
Bacc. en Informatique et Génie Logiciel
[EMAIL PROTECTED]
(450)434-8905

----- Message d'origine -----
De: Erick Erickson <[EMAIL PROTECTED]>
Date: Jeudi, Février 8, 2007 5:13 pm
Objet: Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers

> See below....
> 
> On 2/8/07, Xavier To <[EMAIL PROTECTED]> wrote:
> >
> > Thanks for helping me.
> >
> > I don't really understand what you mean by my Tokenizer 
> "corrects" what
> > the indexing analyzer did.
> 
> 
> You shouldn't have to do change the tokens in the usual case to get 
> thesearch to work right. You mentioned tokenizing the search 
> string, but then
> having to add whitespaces back in. That step is the step that 
> "corrects"what the analyzer did. I put "corrects" in quotes because 
> it isn't really
> correcting anything, the analyzers are doing what they should. But 
> if you
> have to make this manual change, you're trying to fix up the query 
> string to
> match what the analyzer did at index time. Which will leave you 
> correctingthis, then that, then the other thing when it would be 
> much better just to
> use the same analyzer if possible. I've just seen too many "oh, 
> there's one
> more thing" statements in this situation.
> 
> 
> By the way, the tokenizer we use is one provided in Lucene. My 
> guess is that
> > the problem was that the analyzer was thought to be the same by 
> the guy who
> > made the search engine, but the querying analyzer is fetched 
> inside a JAR by
> > a bean. Could it be that this is the problem ?
> 
> 
> It shouldn't be if the same analyzer is fetched inside the bean. 
> Can't you
> check what analyzer is used in both cases?
> 
> Erick
> 
> 
> Xavier Tô
> > Bacc. en Informatique et Génie Logiciel
> > [EMAIL PROTECTED]
> > (450)434-8905
> >
> > ----- Message d'origine -----
> > De: Erick Erickson <[EMAIL PROTECTED]>
> > Date: Jeudi, Février 8, 2007 12:51 pm
> > Objet: Re: Re : Re: Re : Re: Question concerning Analyzers
> >
> > > Well, you've proved that your problem is that the analyzer you're
> > > using when
> > > querying isn't matching what you use during indexing. I think that
> > > whatyou've done will lead you into significant problems down the
> > > road as your
> > > tokenizer then has to "correct" for what the index analyzer did
> > > though.
> > > What would probably be MUCH less work in the long run is to 
> align the
> > > analyzer you use at query time with the analyzer you use at index
> > > time. You
> > > can use a PerFieldAnalyzerWrapper to handle different fields in
> > > differentways. Forget your custom tokenizer for the time being,
> > > just try using the
> > > same analyzer during searching that you used during indexing. You
> > > can use
> > > the
> > > *QueryParser<file:///C:/lucene-
> > > 
> 2.0.0/docs/api/org/apache/lucene/queryParser/QueryParser.html#QueryParser%28java.lang.String,%20org.apache.lucene.analysis.Analyzer%29>*(String>
>  <http://java.sun.com/j2se/1.4/docs/api/java/lang/String.html> f,
> > > Analyzer<file:///C:/lucene-
> > > 2.0.0/docs/api/org/apache/lucene/analysis/Analyzer.html> a)
> > >
> > > form of the QueryParser, where the Analyzer is the same one you
> > > used when
> > > indexing. There are some circumstances where you want to use 
> different> > analyzers when querying and when indexing, but don't 
> go there
> > > unless you
> > > need to <G>....
> > >
> > > If that doesn't do what you want, I'd really recommend is that you
> > > make your
> > > own custom Analyzer, built on, say, WhitespaceTokenizer,
> > > LowerCaseFilter.This is usually the way I've approached this kind
> > > of problem. And use *that*
> > > one at index and query time.
> > >
> > > There's an example in Lucene In Action, see the SynonymAnalyzer
> > > example.That example is MUCH more complex than you'll need <G>...
> > >
> > > Best
> > > Erick
> > >
> > > On 2/8/07, Xavier To <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Hey !
> > > >
> > > > I tried using WhitespaceAnalyzer during the search and it 
> works. I
> > > > refactored the tokenizing process so it uses TokenStream 
> instead of
> > > > StringTokenizer and it works fine for one thing : the query 
> "this> > is a test"
> > > > becomes "thisisatest". I fixed it by adding a space after each
> > > token except
> > > > for the last one, but is there a clean way to do it ? I'm using
> > > > WhitespaceTokenizer.
> > > >
> > > > Thanks a bunch !
> > > >
> > > > Xavier Tô
> > > > Bacc. en Informatique et Génie Logiciel
> > > > [EMAIL PROTECTED]
> > > > (450)434-8905
> > > >
> > > > ----- Message d'origine -----
> > > > De: Erick Erickson <[EMAIL PROTECTED]>
> > > > Date: Mercredi, Février 7, 2007 4:28 pm
> > > > Objet: Re: Re : Re: Question concerning Analyzers
> > > >
> > > > > Then the analyzer you're using when parsing the query is 
> stripping> > > > them. It
> > > > > must be different than the one you use when indexing somehow.
> > > At least
> > > > > that's the only explanation I can imagine....
> > > > >
> > > > > Perhaps, somehow, you are using a default analyzer when you
> > > parse a
> > > > > query?Or you aren't specifying the field when you query and
> > > thus a
> > > > > default is
> > > > > used? Or you are using a PerFieldAnalyzerWrapper and dropping
> > > > > through to the
> > > > > default? or ????
> > > > >
> > > > > Just for yucks, I'd try using WhitespaceAnalyzer on a query 
> with> > > > somethingyou *know* exists in the index for a 
> particular field and
> > > > > work my way up to
> > > > > whatever your real problem is in small steps (since you 
> can't post
> > > > > code<G>)......
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > > On 2/7/07, Xavier To <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > Thanks Erik and Erick,
> > > > > >
> > > > > > I guess my question was rather unclear, but you guys 
> answered it
> > > > > all the
> > > > > > same : it is impossible for an analyzer to index 
> something and
> > > > > having the
> > > > > > same analyzer ignore the thing indexed during a search.
> > > > > >
> > > > > > If it makes everything clearer, during indexation, 
> numbers  are
> > > > > indexed,> whether or not they are accompanied by letters ( 
> 2003> > and> > 4wd are both
> > > > > > indexed). That's fine, since we want this.  The problem 
> occurs> > > > when I try to
> > > > > > search for them : They are ignored. I know they are indexed
> > > > > because I ran
> > > > > > through the index using Luke.
> > > > > >
> > > > > > Any thoughts regarding this problem ?
> > > > > >
> > > > > > Xavier Tô
> > > > > > Bacc. en Informatique et Génie Logiciel
> > > > > > [EMAIL PROTECTED]
> > > > > > (450)434-8905
> > > > > >
> > > > > > ----- Message d'origine -----
> > > > > > De: Erik Hatcher <[EMAIL PROTECTED]>
> > > > > > Date: Mercredi, Février 7, 2007 3:15 pm
> > > > > > Objet: Re: Question concerning Analyzers
> > > > > >
> > > > > > > There is no requirement that you use the same analyzer to
> > > > > search as
> > > > > > >
> > > > > > > you used to index.  So, yes, you could certainly index
> > > things and
> > > > > > > ignore them during a search.
> > > > > > >
> > > > > > >       Erik
> > > > > > >
> > > > > > >
> > > > > > > On Feb 7, 2007, at 2:10 PM, Xavier To wrote:
> > > > > > >
> > > > > > > > Hi, me again
> > > > > > > >
> > > > > > > > I'm still stuck with my search engine, but something 
> popped> > > > in my
> > > > > > >
> > > > > > > > head : Can an analyzer index something but ignore it
> > > during a
> > > > > > > > search ? I'm asking this because now that I've been
> > > searching> > for> >
> > > > > > > > an answer, I've come to think that I should redo the 
> whole> > > > search> >
> > > > > > > > engine, but I don't want to reproduce the same error as
> > > we have
> > > > > > > > now. It would be stupid to accidentaly redo the same
> > > mistake. I
> > > > > > > > still haven't received news from my seniors about me 
> posting> > > > code> >
> > > > > > > > and all...
> > > > > > > >
> > > > > > > > Xavier Tô
> > > > > > > > Bacc. en Informatique et Génie Logiciel
> > > > > > > > [EMAIL PROTECTED]
> > > > > > > > (450)434-8905
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ------------------------------------------------------
> ----
> > > ----
> > > > > ----
> > > > > > > ---
> > > > > > > > To unsubscribe, e-mail: java-user-
> > > [EMAIL PROTECTED]> > > > > For additional commands, 
> e-
> > > mail: java-user-
> > > > > [EMAIL PROTECTED]> >
> > > > > > >
> > > > > > > --------------------------------------------------------
> ----
> > > ----
> > > > > ----
> > > > > > > -
> > > > > > > To unsubscribe, e-mail: java-user-
> > > [EMAIL PROTECTED]> > > > For additional commands, e-
> > > mail: [EMAIL PROTECTED]
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > ----------------------------------------------------------
> ----
> > > ----
> > > > > ---
> > > > > > To unsubscribe, e-mail: java-user-
> [EMAIL PROTECTED]> > > > > For additional commands, e-
> mail: java-user-
> > > [EMAIL PROTECTED]> > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --------------------------------------------------------------
> ----
> > > ---
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: java-user-
> [EMAIL PROTECTED]> > >
> > > >
> > >
> >
> >
> > ------------------------------------------------------------------
> ---
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re : Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers

Reply via email to