One more question to the group. From what I have gathered, my choices for indexing
and querying Spanish content are:
1. StandardAnalyzer (I read that this analyzer could be used for "European" languages)
2. SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); <--custom stop words from
Ernesto class below
Can I assume that choice 2 would be the better for Spanish content?
thanks,
chad.
-----Original Message-----
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 3:31 PM
To: Lucene Users List
Subject: Re: spanish stemmer
Because the SnowballAnalyzer, and SpanishStemmer don�t have a default
stopword set.
SnowballAnalyzer constructor:
/** Builds the named analyzer with no stop words. */
public SnowballAnalyzer(String name) {
this.name = name;
}
Note the comment.
Bye,
Ernesto.
----- Original Message -----
From: "Chad Small" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, August 23, 2004 4:57 PM
Subject: RE: spanish stemmer
Excellent Ernesto.
Was there a reason you used your own stop word list and not just the default
constructor SnowballAnalyzer("Spanish")?
thanks,
chad.
-----Original Message-----
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 2:03 PM
To: Lucene Users List
Subject: Re: spanish stemmer
Yes, is too easy.
You need do a wrapper for spanish Snowball initilization.
analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS);
above the complete code.
Bye, Ernesto.
--------------------------------------------------
public class SpanishAnalyzer extends Analyzer {
private static SnowballAnalyzer analyzer;
private String SPANISH_STOP_WORDS[] = {
"un", "una", "unas", "unos", "uno", "sobre", "todo", "tambi�n", "tras",
"otro", "alg�n", "alguno", "alguna",
"algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy",
"esta", "estamos", "estais",
"estan", "en", "para", "atras", "porque", "por qu�", "estado", "estaba",
"ante", "antes", "siendo",
"ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis",
"pueden", "fui", "fue", "fuimos",
"fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada",
"fin", "incluso", "primero",
"desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos",
"consiguen", "ir", "voy", "va",
"vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene",
"tenemos", "teneis", "tienen",
"el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos",
"ellas", "nos", "nosotros", "vosotros",
"vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe",
"sabemos", "sabeis", "saben",
"ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas",
"sus", "entonces", "tiempo",
"verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta",
"ciertas", "intentar", "intento",
"intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo",
"arriba", "encima", "usar",
"uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo",
"empleas", "emplean", "ampleamos",
"empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien",
"cual", "cuando", "donde",
"mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar",
"trabajas", "trabaja", "trabajamos",
"trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian",
"podriais", "yo", "aquel", "mi",
"de", "a", "e", "i", "o", "u"};
public SpanishAnalyzer() {
analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS);
}
public SpanishAnalyzer(String stopWords[]) {
analyzer = new SnowballAnalyzer("Spanish", stopWords);
}
public TokenStream tokenStream(String fieldName, Reader reader) {
return analyzer.tokenStream(fieldName, reader);
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]