Re: Question about StandardAnalyzer.cs

Floyd Wu Wed, 04 Mar 2009 23:58:59 -0800

Hi Jokin,

Thanks for your reply, and I'm very sure that using analyzer (from SVN trun
compiled one) to index my document. The following is the code snippet


                m_Writer.UpdateDocument(
                    term,
                    LuceneDocumentConverter.ToDocument(content),
                    Analyzer);
pretty simple, and I pass the analyzer into.
I don't know why.

2009/3/5 Jokin Cuadrado <joki...@gmail.com>

> First of all, the field stored value is different from the indexed
> terms value, wich of them are you telling to us? if you remove the
> lowercase filter it works, so I,m pretty sure that you are not doing
> that at index writing time, so you are not using the standaranalyzer,
> or you have used a version without the lowercase filter. Might you
> post the snippet of the index creator code?
>
>
> On 3/5/09, Floyd Wu <floyd...@gmail.com> wrote:
> > Hi Michael,
> > I'm sure that I use StandardAnalyzer when indexing. The problem is I need
> to
> > get search result when I query "Z123456" to my index filed named
> "author_id"
> > and currently this field value is "Z123456" shown by Luke-0.8.1 in index.
> >
> > I'm stuck here for a month. Please help on this.
> > Thanks
> >
> >
> >
> > 2009/3/5 Michael Mitiaguin <mitiag...@gmail.com>
> >
> >> As mentioned in this thread could you re-check that you explicitly  use
> >> StandardAnalyzer when indexing.
> >> I must admit though I am still using 2.0.4
> >>
> >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> >>
> >> In Luke if  you to select plugins > Analyzer tool  > StandardAnalyzer
> >> it also makes lowercase
> >> Original text : Z123456  tokens found : z123456
> >>
> >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <floyd...@gmail.com> wrote:
> >>
> >> > I'm sure the application and Luke use the same analyzer,
> >> > StandardAnalyer.
> >> > But I can't search "Z123456" and I don't know why. As log as I
> >> > commentted
> >> > out StandardAnalyzer.cs
> >> > line: result = new LowerCaseFilter(result);
> >> > The result will be what I want.
> >> >
> >> >
> >> >
> >> > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> >> >
> >> > > using luke you could use another analyzers as well, so use the
> keyword
> >> > > analyzer for example. But regards your application, you must use the
> >> same
> >> > > analyzer whe you make your index and when you query it.
> >> > >
> >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <floyd...@gmail.com>
> wrote:
> >> > >
> >> > > > But the current situation is: I can't search any result with
> >> "Z123456"
> >> > > when
> >> > > > I type "Z123456" or "z123456".
> >> > > >
> >> > > > I'm using StandardAnalyzer and by using luke, the value indexed is
> >> > > > "Z123456".
> >> > > > How can I fix this problem?
> >> > > >
> >> > > >
> >> > > >
> >> > > > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> >> > > >
> >> > > > > the rationale behind using the lowercase filter, is that it
> would
> >> > mach
> >> > > > when
> >> > > > > you search both of Z123456 and z132456, so the searchs are case
> >> > > > > insensitive,
> >> > > > > however, as with any filter, you must use the same analyzer when
> >> > > indexing
> >> > > > > your documents, Are you doing that?
> >> > > > >
> >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <floyd...@gmail.com>
> >> wrote:
> >> > > > >
> >> > > > > > Hi all,
> >> > > > > > My problem is I have a field and the field is set to be
>  Indexed
> >> &
> >> > > > > Stored.
> >> > > > > > The index value is Z123456.
> >> > > > > > But when I using StandardAnalyzer to search this field, it
> seems
> >> > >  that
> >> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
> >> > "z123456".
> >> > > > > After
> >> > > > > > walk through source code, I found following lines:
> >> > > > > >  public override TokenStream TokenStream(System.String
> >> > > > > > fieldName,
> >> > > > > > System.IO.TextReader reader)
> >> > > > > >  {
> >> > > > > >   StandardTokenizer tokenStream = new
> StandardTokenizer(reader,
> >> > > > > > replaceInvalidAcronym);
> >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> >> > > > > >   result = new LowerCaseFilter(result);
> >> > > > > >   result = new StopFilter(result, stopSet);
> >> > > > > >   return result;
> >> > > > > >  }
> >> > > > > >
> >> > > > > > Why using LoweCasefilter() here? If I comment out this line,
> >> > > > > > will
> >> I
> >> > > > have
> >> > > > > > any
> >> > > > > > potential problems?
> >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> filter.
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Jokin
> >> > > > > Sent from: Sant cugat del valles  Spain.
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Jokin
> >> > >
> >> >
> >>
> >
>
>
> --
> Jokin
>

Re: Question about StandardAnalyzer.cs

Reply via email to