On Mon, 17 Nov 2008 13:07:35 -0200
Rafael Cunha de Almeida <[EMAIL PROTECTED]> wrote:
> On Thu, 13 Nov 2008 12:12:17 -0500
> Matthew Hall <[EMAIL PROTECTED]> wrote:
>
> > Which Analyzer have you assigned per field?
> >
> > The PerFieldAnalyzerWrapper uses a default analyzer (the one you passed
> > during its construction), and then you assign specific analyzers to each
> > field that you want to have special treatment.
> >
> > For example:
> >
> > PerFieldAnalyzerWrapper aWrapper = new PerFieldAnalyzerWrapper(
> > new StandardAnalyzer());
> > aWrapper.addAnalyzer("data", new MGIAnalyzer());
> > aWrapper.addAnalyzer("sdata", new StemmedMGIAnalyzer());
> >
> > Now, for the fields in question, have you assigned an Analyzer that
> > doesn't actually use stopwords? (there are several available in core)
> > Or are you perchance using a custom Analyzer that doesn't process stop
> > words?
> >
> > Could you possibly post your Initialization code for this? If so I
> > think we could be of more help to you.
>
> I wrote this method, which returns me the analyzer:
> static public Analyzer getAnalyzer()
> {
> PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(
> new KeywordAnalyzer());
> analyzer.addAnalyzer("placas", new UniqueTokensAnalyzer());
> analyzer.addAnalyzer("ano", new UniqueTokensAnalyzer());
> analyzer.addAnalyzer("no_reds", new NumberAnalyzer());
>
> analyzer.addAnalyzer("nomes", new SimpleBrazilianAnalyzer());
> analyzer.addAnalyzer("apelidos", new SimpleBrazilianAnalyzer());
> analyzer.addAnalyzer("historico", new SimpleBrazilianAnalyzer
> ()); analyzer.addAnalyzer("modosAcaoCriminosa", new
> SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("nomeMunicipio", new
> SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("nomeBairro", new
> SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("logradouro", new
> SimpleBrazilianAnalyzer()); analyzer.addAnalyzer("textoComplementar",
> new SimpleBrazilianAnalyzer());
>
> return analyzer;
> }
>
> SimpleBrazilianAnalyzer is my own analyzer that uses stopwords.
>
> I pass that analyzer to MultiFieldQueryParser together with an array
> with all the fields, ie. those fields and more.
>
> When I do an AND search I'd like it to ignore stopwords. My best idea
> so far is to make my own tokenizer and remove stopwords from the
> search. How does that sound?
Another related problem, exact search cannot succeed if there's a
stopword in between terms of the exact search. For instance:
"Ready to ride"
will never return any documents if "to" is a stopword. Shouldn't the
stopwords be ignored as if they weren't even there?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]