Re: Question about StandardAnalyzer.cs

Jokin Cuadrado Thu, 05 Mar 2009 01:48:54 -0800

that you are indexing and searching the same term, so lucene founds it.
Make the test with Z123456 and post the code, we will tell you were is the
fault.


On Thu, Mar 5, 2009 at 10:25 AM, Floyd Wu <floyd...@gmail.com> wrote:

> To simplify the question, I di another test. I create index with the
> original document but this time I set "Z123456" to "z123456" and then put
> it
> into lucene index. Fire the query and I got what I want. What does it mean?
>
>
>
>
> 2009/3/5 Jokin Cuadrado <joki...@gmail.com>
>
> > could expand a bit more the code? at least i wan to see where you
> > instantiate the analyzer, where you open the writer, what is the term you
> > use as key for update the document and how you create the document fields
> > also, for discard another kind of problems and isolate the problem, you
> can
> > make something like this in pseudocode:
> >
> > Create a new index
> > add 1 document (with just 1 indexed, stored and tokenized field
> containing
> > "Z123456")
> > close index
> > open index
> > search document
> > close
> >
> > and test if it works, if don't, post your code and we will see what is
> > happening.
> >
> >
> > On Thu, Mar 5, 2009 at 8:58 AM, Floyd Wu <floyd...@gmail.com> wrote:
> >
> > > Hi Jokin,
> > >
> > > Thanks for your reply, and I'm very sure that using analyzer (from SVN
> > trun
> > > compiled one) to index my document. The following is the code snippet
> > >
> > >                m_Writer.UpdateDocument(
> > >                    term,
> > >                    LuceneDocumentConverter.ToDocument(content),
> > >                    Analyzer);
> > > pretty simple, and I pass the analyzer into.
> > > I don't know why.
> > >
> > > 2009/3/5 Jokin Cuadrado <joki...@gmail.com>
> > >
> > > > First of all, the field stored value is different from the indexed
> > > > terms value, wich of them are you telling to us? if you remove the
> > > > lowercase filter it works, so I,m pretty sure that you are not doing
> > > > that at index writing time, so you are not using the standaranalyzer,
> > > > or you have used a version without the lowercase filter. Might you
> > > > post the snippet of the index creator code?
> > > >
> > > >
> > > > On 3/5/09, Floyd Wu <floyd...@gmail.com> wrote:
> > > > > Hi Michael,
> > > > > I'm sure that I use StandardAnalyzer when indexing. The problem is
> I
> > > need
> > > > to
> > > > > get search result when I query "Z123456" to my index filed named
> > > > "author_id"
> > > > > and currently this field value is "Z123456" shown by Luke-0.8.1 in
> > > index.
> > > > >
> > > > > I'm stuck here for a month. Please help on this.
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > 2009/3/5 Michael Mitiaguin <mitiag...@gmail.com>
> > > > >
> > > > >> As mentioned in this thread could you re-check that you explicitly
> > >  use
> > > > >> StandardAnalyzer when indexing.
> > > > >> I must admit though I am still using 2.0.4
> > > > >>
> > > > >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> > > > >>
> > > > >> In Luke if  you to select plugins > Analyzer tool  >
> > StandardAnalyzer
> > > > >> it also makes lowercase
> > > > >> Original text : Z123456  tokens found : z123456
> > > > >>
> > > > >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <floyd...@gmail.com>
> > wrote:
> > > > >>
> > > > >> > I'm sure the application and Luke use the same analyzer,
> > > > >> > StandardAnalyer.
> > > > >> > But I can't search "Z123456" and I don't know why. As log as I
> > > > >> > commentted
> > > > >> > out StandardAnalyzer.cs
> > > > >> > line: result = new LowerCaseFilter(result);
> > > > >> > The result will be what I want.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> > > > >> >
> > > > >> > > using luke you could use another analyzers as well, so use the
> > > > keyword
> > > > >> > > analyzer for example. But regards your application, you must
> use
> > > the
> > > > >> same
> > > > >> > > analyzer whe you make your index and when you query it.
> > > > >> > >
> > > > >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <floyd...@gmail.com
> >
> > > > wrote:
> > > > >> > >
> > > > >> > > > But the current situation is: I can't search any result with
> > > > >> "Z123456"
> > > > >> > > when
> > > > >> > > > I type "Z123456" or "z123456".
> > > > >> > > >
> > > > >> > > > I'm using StandardAnalyzer and by using luke, the value
> > indexed
> > > is
> > > > >> > > > "Z123456".
> > > > >> > > > How can I fix this problem?
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> > > > >> > > >
> > > > >> > > > > the rationale behind using the lowercase filter, is that
> it
> > > > would
> > > > >> > mach
> > > > >> > > > when
> > > > >> > > > > you search both of Z123456 and z132456, so the searchs are
> > > case
> > > > >> > > > > insensitive,
> > > > >> > > > > however, as with any filter, you must use the same
> analyzer
> > > when
> > > > >> > > indexing
> > > > >> > > > > your documents, Are you doing that?
> > > > >> > > > >
> > > > >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <
> > floyd...@gmail.com>
> > > > >> wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi all,
> > > > >> > > > > > My problem is I have a field and the field is set to be
> > > >  Indexed
> > > > >> &
> > > > >> > > > > Stored.
> > > > >> > > > > > The index value is Z123456.
> > > > >> > > > > > But when I using StandardAnalyzer to search this field,
> it
> > > > seems
> > > > >> > >  that
> > > > >> > > > > > StandarAnalyzer will transaform my query text "Z123456"
> to
> > > > >> > "z123456".
> > > > >> > > > > After
> > > > >> > > > > > walk through source code, I found following lines:
> > > > >> > > > > >  public override TokenStream TokenStream(System.String
> > > > >> > > > > > fieldName,
> > > > >> > > > > > System.IO.TextReader reader)
> > > > >> > > > > >  {
> > > > >> > > > > >   StandardTokenizer tokenStream = new
> > > > StandardTokenizer(reader,
> > > > >> > > > > > replaceInvalidAcronym);
> > > > >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > > > >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> > > > >> > > > > >   result = new LowerCaseFilter(result);
> > > > >> > > > > >   result = new StopFilter(result, stopSet);
> > > > >> > > > > >   return result;
> > > > >> > > > > >  }
> > > > >> > > > > >
> > > > >> > > > > > Why using LoweCasefilter() here? If I comment out this
> > line,
> > > > >> > > > > > will
> > > > >> I
> > > > >> > > > have
> > > > >> > > > > > any
> > > > >> > > > > > potential problems?
> > > > >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> > > > filter.
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Jokin
> > > > >> > > > > Sent from: Sant cugat del valles  Spain.
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Jokin
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > > >
> > > > --
> > > > Jokin
> > > >
> > >
> >
> >
> >
> > --
> > Jokin
> >
>



-- 
Jokin
Sent from: Barcelona Catalonia Spain.

Re: Question about StandardAnalyzer.cs

Reply via email to