One more question to the group.  From what I have gathered, my choices for indexing 
and querying Spanish content are:

1.  StandardAnalyzer (I read that this analyzer could be used for "European" languages)

2.  SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS);  <--custom stop words from 
Ernesto class below

Can I assume that choice 2 would be the better for Spanish content?

thanks,
chad.



-----Original Message-----
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 3:31 PM
To: Lucene Users List
Subject: Re: spanish stemmer 


Because the SnowballAnalyzer, and SpanishStemmer don�t have a default
stopword set.

SnowballAnalyzer constructor:

  /** Builds the named analyzer with no stop words. */
  public SnowballAnalyzer(String name) {
    this.name = name;
  }

Note the comment.

Bye,
Ernesto.

----- Original Message ----- 
From: "Chad Small" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, August 23, 2004 4:57 PM
Subject: RE: spanish stemmer


Excellent Ernesto.

Was there a reason you used your own stop word list and not just the default
constructor SnowballAnalyzer("Spanish")?

thanks,
chad.

-----Original Message-----
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 2:03 PM
To: Lucene Users List
Subject: Re: spanish stemmer


Yes, is too easy.

You need do a wrapper for spanish Snowball initilization.

analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS);

above the complete code.

Bye, Ernesto.


--------------------------------------------------
public class SpanishAnalyzer extends Analyzer {

private static SnowballAnalyzer analyzer;


private String SPANISH_STOP_WORDS[] = {

"un", "una", "unas", "unos", "uno", "sobre", "todo", "tambi�n", "tras",
"otro", "alg�n", "alguno", "alguna",

"algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy",
"esta", "estamos", "estais",

"estan", "en", "para", "atras", "porque", "por qu�", "estado", "estaba",
"ante", "antes", "siendo",

"ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis",
"pueden", "fui", "fue", "fuimos",

"fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada",
"fin", "incluso", "primero",

"desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos",
"consiguen", "ir", "voy", "va",

"vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene",
"tenemos", "teneis", "tienen",

"el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos",
"ellas", "nos", "nosotros", "vosotros",

"vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe",
"sabemos", "sabeis", "saben",

"ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas",
"sus", "entonces", "tiempo",

"verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta",
"ciertas", "intentar", "intento",

"intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo",
"arriba", "encima", "usar",

"uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo",
"empleas", "emplean", "ampleamos",

"empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien",
"cual", "cuando", "donde",

"mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar",
"trabajas", "trabaja", "trabajamos",

"trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian",
"podriais", "yo", "aquel", "mi",

"de", "a", "e", "i", "o", "u"};

public SpanishAnalyzer() {

analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS);

}

public SpanishAnalyzer(String stopWords[]) {

analyzer = new SnowballAnalyzer("Spanish", stopWords);

}

public TokenStream tokenStream(String fieldName, Reader reader) {

return analyzer.tokenStream(fieldName, reader);

}

}





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to