Re: Question about StandardAnalyzer.cs

Jokin Cuadrado Thu, 05 Mar 2009 01:19:41 -0800

could expand a bit more the code? at least i wan to see where you
instantiate the analyzer, where you open the writer, what is the term you
use as key for update the document and how you create the document fields
also, for discard another kind of problems and isolate the problem, you can
make something like this in pseudocode:


Create a new index
add 1 document (with just 1 indexed, stored and tokenized field containing
"Z123456")
close index
open index
search document
close

and test if it works, if don't, post your code and we will see what is
happening.


On Thu, Mar 5, 2009 at 8:58 AM, Floyd Wu <floyd...@gmail.com> wrote:

> Hi Jokin,
>
> Thanks for your reply, and I'm very sure that using analyzer (from SVN trun
> compiled one) to index my document. The following is the code snippet
>
>                m_Writer.UpdateDocument(
>                    term,
>                    LuceneDocumentConverter.ToDocument(content),
>                    Analyzer);
> pretty simple, and I pass the analyzer into.
> I don't know why.
>
> 2009/3/5 Jokin Cuadrado <joki...@gmail.com>
>
> > First of all, the field stored value is different from the indexed
> > terms value, wich of them are you telling to us? if you remove the
> > lowercase filter it works, so I,m pretty sure that you are not doing
> > that at index writing time, so you are not using the standaranalyzer,
> > or you have used a version without the lowercase filter. Might you
> > post the snippet of the index creator code?
> >
> >
> > On 3/5/09, Floyd Wu <floyd...@gmail.com> wrote:
> > > Hi Michael,
> > > I'm sure that I use StandardAnalyzer when indexing. The problem is I
> need
> > to
> > > get search result when I query "Z123456" to my index filed named
> > "author_id"
> > > and currently this field value is "Z123456" shown by Luke-0.8.1 in
> index.
> > >
> > > I'm stuck here for a month. Please help on this.
> > > Thanks
> > >
> > >
> > >
> > > 2009/3/5 Michael Mitiaguin <mitiag...@gmail.com>
> > >
> > >> As mentioned in this thread could you re-check that you explicitly
>  use
> > >> StandardAnalyzer when indexing.
> > >> I must admit though I am still using 2.0.4
> > >>
> > >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> > >>
> > >> In Luke if  you to select plugins > Analyzer tool  > StandardAnalyzer
> > >> it also makes lowercase
> > >> Original text : Z123456  tokens found : z123456
> > >>
> > >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <floyd...@gmail.com> wrote:
> > >>
> > >> > I'm sure the application and Luke use the same analyzer,
> > >> > StandardAnalyer.
> > >> > But I can't search "Z123456" and I don't know why. As log as I
> > >> > commentted
> > >> > out StandardAnalyzer.cs
> > >> > line: result = new LowerCaseFilter(result);
> > >> > The result will be what I want.
> > >> >
> > >> >
> > >> >
> > >> > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> > >> >
> > >> > > using luke you could use another analyzers as well, so use the
> > keyword
> > >> > > analyzer for example. But regards your application, you must use
> the
> > >> same
> > >> > > analyzer whe you make your index and when you query it.
> > >> > >
> > >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <floyd...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > But the current situation is: I can't search any result with
> > >> "Z123456"
> > >> > > when
> > >> > > > I type "Z123456" or "z123456".
> > >> > > >
> > >> > > > I'm using StandardAnalyzer and by using luke, the value indexed
> is
> > >> > > > "Z123456".
> > >> > > > How can I fix this problem?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > 2009/3/4 Jokin Cuadrado <joki...@gmail.com>
> > >> > > >
> > >> > > > > the rationale behind using the lowercase filter, is that it
> > would
> > >> > mach
> > >> > > > when
> > >> > > > > you search both of Z123456 and z132456, so the searchs are
> case
> > >> > > > > insensitive,
> > >> > > > > however, as with any filter, you must use the same analyzer
> when
> > >> > > indexing
> > >> > > > > your documents, Are you doing that?
> > >> > > > >
> > >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <floyd...@gmail.com>
> > >> wrote:
> > >> > > > >
> > >> > > > > > Hi all,
> > >> > > > > > My problem is I have a field and the field is set to be
> >  Indexed
> > >> &
> > >> > > > > Stored.
> > >> > > > > > The index value is Z123456.
> > >> > > > > > But when I using StandardAnalyzer to search this field, it
> > seems
> > >> > >  that
> > >> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
> > >> > "z123456".
> > >> > > > > After
> > >> > > > > > walk through source code, I found following lines:
> > >> > > > > >  public override TokenStream TokenStream(System.String
> > >> > > > > > fieldName,
> > >> > > > > > System.IO.TextReader reader)
> > >> > > > > >  {
> > >> > > > > >   StandardTokenizer tokenStream = new
> > StandardTokenizer(reader,
> > >> > > > > > replaceInvalidAcronym);
> > >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> > >> > > > > >   result = new LowerCaseFilter(result);
> > >> > > > > >   result = new StopFilter(result, stopSet);
> > >> > > > > >   return result;
> > >> > > > > >  }
> > >> > > > > >
> > >> > > > > > Why using LoweCasefilter() here? If I comment out this line,
> > >> > > > > > will
> > >> I
> > >> > > > have
> > >> > > > > > any
> > >> > > > > > potential problems?
> > >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> > filter.
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Jokin
> > >> > > > > Sent from: Sant cugat del valles  Spain.
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Jokin
> > >> > >
> > >> >
> > >>
> > >
> >
> >
> > --
> > Jokin
> >
>



-- 
Jokin

Re: Question about StandardAnalyzer.cs

Reply via email to