WG: Filter and stop-words

Ruffieux Stephane, yellowworld extern Mon, 03 Dec 2001 04:33:19 -0800

Yes, it is a way. You have to implement
org.apache.lucene.analysis.TokenFilter and to implement the next() method
with the algorithm for Portugese. After that you can create a
PortugeseAnalyser (inherits from Analyser) which use it.


In the actual version of Lucene it's an implementation for german. It can
perhaps help you. 

The following link gives algorithms for some european langages:
http://snowball.sourceforge.net/texts/stemmersoverview.html
<http://snowball.sourceforge.net/texts/stemmersoverview.html> 

Best regards.

St�phane

PS: I work on the same problem, but for french and italian. Had somebody
still programming stemmer for this languages?

-----Urspr�ngliche Nachricht-----
Von:    Bizu de An�ncio [SMTP:[EMAIL PROTECTED]]
<mailto:[SMTP:[EMAIL PROTECTED]]> 
Gesendet am:    Montag, 3. Dezember 2001 13:22
An:     [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> 
Betreff:        Filter and stop-words

        I'm new to Lucene. First of all I would like to know if there is a
search
arquive like "sun servlets list".

        My first problem is that I want to index a Portuguese database and I
need
to remove the "s" (plural) and acents (� � ...) from the words. Is there a
way of passing a filter class to the Lucene indexer ? And about the
stop-words, where should I configure Lucene to ignore it ?

        Any help would be appreciated,

        thanks a lot,

                jk


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> >
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> >

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

WG: Filter and stop-words

Reply via email to