Re: German*Filter, Analyzer "cutting" off letters from (french) words...

Erick Erickson Mon, 18 Apr 2011 04:36:12 -0700

You can easily string together your own tokenizer and any number of filters
to create an analyzer that does exactly what you need. Lucene In Action
shows an example for creating your own analyzer by assembling
the standard parts....


Best
Erick

On Mon, Apr 18, 2011 at 3:08 AM, Clemens Wyss <[email protected]> wrote:
> What is the best way to "avoid" the lowercasing (and still being able to 
> exclude stop words)?
>
>> -----Ursprüngliche Nachricht-----
>> Von: Simon Willnauer [mailto:[email protected]]
>> Gesendet: Freitag, 15. April 2011 08:56
>> An: [email protected]
>> Betreff: Re: German*Filter, Analyzer "cutting" off letters from (french)
>> words...
>>
>> On Fri, Apr 15, 2011 at 8:48 AM, Clemens Wyss <[email protected]>
>> wrote:
>> > Does the StandardAnalyzer lowercase its terms?
>> yes!
>>
>> simon
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: Clemens Wyss [mailto:[email protected]]
>> >> Gesendet: Mittwoch, 13. April 2011 13:34
>> >> An: [email protected]
>> >> Betreff: AW: German*Filter, Analyzer "cutting" off letters from
>> >> (french) words...
>> >>
>> >> This is what I was looking for, thanks
>> >>
>> >> > -----Ursprüngliche Nachricht-----
>> >> > Von: Robert Muir [mailto:[email protected]]
>> >> > Gesendet: Mittwoch, 13. April 2011 12:11
>> >> > An: [email protected]
>> >> > Betreff: Re: German*Filter, Analyzer "cutting" off letters from
>> >> > (french) words...
>> >> >
>> >> > If you only want to ignore german stopwords, you don't need to use
>> >> > the german analyzer with german stemming. you can just use
>> >> > StandardAnalyzer with your own stopwords set!
>> >> >
>> >> > On Wed, Apr 13, 2011 at 3:51 AM, Clemens Wyss
>> >> <[email protected]>
>> >> > wrote:
>> >> > > What I really want to do is ignore german stop words such as
>> >> > > "der", "die",
>> >> > "das", "ein",...
>> >> > >
>> >> > >> -----Ursprüngliche Nachricht-----
>> >> > >> Von: Robert Muir [mailto:[email protected]]
>> >> > >> Gesendet: Dienstag, 12. April 2011 17:03
>> >> > >> An: [email protected]
>> >> > >> Betreff: Re: German*Filter, Analyzer "cutting" off letters from
>> >> > >> (french) words...
>> >> > >>
>> >> > >> On Tue, Apr 12, 2011 at 8:46 AM, Clemens Wyss
>> >> > <[email protected]>
>> >> > >> wrote:
>> >> > >> > Why so? Where have the e's gone?
>> >> > >> >
>> >> > >>
>> >> > >> the e is being stemmed as its a german suffix... all of the
>> >> > >> german stemming algorithms remove final -e, as do all the french
>> >> > >> stemming
>> >> > algorithms.
>> >> > >>
>> >> > >> so i don't understand your problem.
>> >> > >>
>> >> > >> ----------------------------------------------------------------
>> >> > >> ---
>> >> > >> -- To unsubscribe, e-mail:
>> >> > >> [email protected]
>> >> > >> For additional commands, e-mail:
>> >> > >> [email protected]
>> >> > >
>> >> > >
>> >> >
>> >> > -------------------------------------------------------------------
>> >> > -- To unsubscribe, e-mail: [email protected]
>> >> > For additional commands, e-mail: [email protected]
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: German*Filter, Analyzer "cutting" off letters from (french) words...

Reply via email to