Re: Question about StandardAnalyzer.cs

Floyd Wu Thu, 05 Mar 2009 01:25:45 -0800

To simplify the question, I di another test. I create index with the
original document but this time I set "Z123456" to "z123456" and then put it
into lucene index. Fire the query and I got what I want. What does it mean?





2009/3/5 Jokin Cuadrado <joki...@gmail.com>

> could expand a bit more the code? at least i wan to see where you
> instantiate the analyzer, where you open the writer, what is the term you
> use as key for update the document and how you create the document fields
> also, for discard another kind of problems and isolate the problem, you can
> make something like this in pseudocode:
>
> Create a new index
> add 1 document (with just 1 indexed, stored and tokenized field containing
> "Z123456")
> close index
> open index
> search document
> close
>
> and test if it works, if don't, post your code and we will see what is
> happening.
>
>
> On Thu, Mar 5, 2009 at 8:58 AM, Floyd Wu <floyd...@gmail.com> wrote:
>
> > Hi Jokin,
> >
> > Thanks for your reply, and I'm very sure that using analyzer (from SVN
> trun
> > compiled one) to index my document. The following is the code snippet
> >
> >                m_Writer.UpdateDocument(
> >                    term,
> >                    LuceneDocumentConverter.ToDocument(content),
> >                    Analyzer);
> > pretty simple, and I pass the analyzer into.
> > I don't know why.
> >
> > 2009/3/5 Jokin Cuadrado <joki...@gmail.com>
> >
> > > First of all, the field stored value is different from the indexed
> > > terms value, wich of them are you telling to us? if you remove the
> > > lowercase filter it works, so I,m pretty sure that you are not doing
> > > that at index writing time, so you are not using the standaranalyzer,
> > > or you have used a version without the lowercase filter. Might you
> > > post the snippet of the index creator code?
> > >
> > >
> > > On 3/5/09, Floyd Wu <floyd...@gmail.com> wrote:
> > > > Hi Michael,
> > > > I'm sure that I use StandardAnalyzer when indexing. The problem is I
> > need
> > > to
> > > > get search result when I query "Z123456" to my index filed named
> > > "author_id"
> > > > and currently this field value is "Z123456" shown by Luke-0.8.1 in
> > index.
> > > >
> > > > I'm stuck here for a month. Please help on this.
> > > > Thanks
> > > >
> > > >
> > > >
> > > > 2009/3/5 Michael Mitiaguin <mitiag...@gmail.com>
> > > >
> > > >> As mentioned in this thread could you re-check that you explicitly
> >  use
> > > >> StandardAnalyzer when indexing.
> > > >> I must admit though I am still using 2.0.4
> > > >>
> > > >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> > > >>
> > > >> In Luke if  you to select plugins > Analyzer tool  >
> StandardAnalyzer
> > > >> it also makes lowercase
> > > >> Original text : Z123456  tokens found : z123456
> > > >>
> > > >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <floyd...@gmail.com>
> wrote:
> > > >>
> > > >> > I'm sure the application and Luke use the same analyzer,
> > > >> > StandardAnalyer.
> > > >> > But I can't search "Z123456" and I don't know why. As log as I
> > > >> > commentted
> > > >> > out StandardAnalyzer.cs
> > > >> > line: result = new LowerCaseFilter(result);
> > > >> > The result will be what I want.
> > > >> >
> > > >> >
> > > >> >
> > > >> > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> > > >> >
> > > >> > > using luke you could use another analyzers as well, so use the
> > > keyword
> > > >> > > analyzer for example. But regards your application, you must use
> > the
> > > >> same
> > > >> > > analyzer whe you make your index and when you query it.
> > > >> > >
> > > >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <floyd...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > But the current situation is: I can't search any result with
> > > >> "Z123456"
> > > >> > > when
> > > >> > > > I type "Z123456" or "z123456".
> > > >> > > >
> > > >> > > > I'm using StandardAnalyzer and by using luke, the value
> indexed
> > is
> > > >> > > > "Z123456".
> > > >> > > > How can I fix this problem?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> > > >> > > >
> > > >> > > > > the rationale behind using the lowercase filter, is that it
> > > would
> > > >> > mach
> > > >> > > > when
> > > >> > > > > you search both of Z123456 and z132456, so the searchs are
> > case
> > > >> > > > > insensitive,
> > > >> > > > > however, as with any filter, you must use the same analyzer
> > when
> > > >> > > indexing
> > > >> > > > > your documents, Are you doing that?
> > > >> > > > >
> > > >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <
> floyd...@gmail.com>
> > > >> wrote:
> > > >> > > > >
> > > >> > > > > > Hi all,
> > > >> > > > > > My problem is I have a field and the field is set to be
> > >  Indexed
> > > >> &
> > > >> > > > > Stored.
> > > >> > > > > > The index value is Z123456.
> > > >> > > > > > But when I using StandardAnalyzer to search this field, it
> > > seems
> > > >> > >  that
> > > >> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
> > > >> > "z123456".
> > > >> > > > > After
> > > >> > > > > > walk through source code, I found following lines:
> > > >> > > > > >  public override TokenStream TokenStream(System.String
> > > >> > > > > > fieldName,
> > > >> > > > > > System.IO.TextReader reader)
> > > >> > > > > >  {
> > > >> > > > > >   StandardTokenizer tokenStream = new
> > > StandardTokenizer(reader,
> > > >> > > > > > replaceInvalidAcronym);
> > > >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > > >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> > > >> > > > > >   result = new LowerCaseFilter(result);
> > > >> > > > > >   result = new StopFilter(result, stopSet);
> > > >> > > > > >   return result;
> > > >> > > > > >  }
> > > >> > > > > >
> > > >> > > > > > Why using LoweCasefilter() here? If I comment out this
> line,
> > > >> > > > > > will
> > > >> I
> > > >> > > > have
> > > >> > > > > > any
> > > >> > > > > > potential problems?
> > > >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> > > filter.
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Jokin
> > > >> > > > > Sent from: Sant cugat del valles  Spain.
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Jokin
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> > >
> > > --
> > > Jokin
> > >
> >
>
>
>
> --
> Jokin
>

Re: Question about StandardAnalyzer.cs

Reply via email to