Re: Spanish analyzer in ravendb

Simon Svensson Thu, 14 Jun 2012 10:45:16 -0700

It's easy to write analyzers, you basically chain together a fewTokenFilters and call it a day. And to back up that statement I providean example spanish analyzer written by someone who basically threw hiscomplete Spanish vocabulary into the stop word list. DictionaryLoader isa class which loads your hunspell dictionaries (.aff and .dic files)from your storage (filesystem, embedded resources, etc). There are somefurther development that can be done, like overriding/implementingReusableTokenStream and verify that the filters are in the correct order.


using System;
using System.Collections;
using System.IO;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Hunspell;
using Lucene.Net.Analysis.Standard;
using Version = Lucene.Net.Util.Version;


public class SpanishHunspellAnalyzer : Analyzer {

private static readonly HunspellDictionary Dictionary =DictionaryLoader.Load(@"es_ES");

    private static readonly Hashtable Stopwords = new Hashtable {

{ "Me", null }, { "no", null }, { "habla", null }, { "español",null }

};

public override TokenStream TokenStream(String fieldName,TextReader reader) {

        var stream = new StandardTokenizer(Version.LUCENE_29, reader);

        TokenFilter filter = new LowerCaseFilter(stream);
        filter = new HunspellStemFilter(filter, Dictionary);
        filter = new StopFilter(true, filter, Stopwords, true);
        return filter;
    }
}

// Simon

On 2012-06-14 18:44, vicente garcia wrote:

Thank you Simon, you can specify a
"Raven.Database.Indexing.Collation.Cultures.EsCollationAnalyzer,
Raven.Database" but you can't perform full text search queries because
this index don't tokenize the content.
http://ravendb.net/docs/client-api/querying/static-indexes/customizing-results-order

I saw that there is not a SpanishAnalyzer, we only have a
SpanishStemmer, but I don't need an stammer, I need a spanish analyzer
with its stops words, etc.

Has someones another idea on how to index Spanish content?

Thank you very much :)

On Thu, Jun 14, 2012 at 4:59 PM, Simon Svensson<si...@devhost.se>  wrote:

Welcome,

See Configuring index options[1] to specify a custom analyzer that can
handle spanish content.

A quick check shows that Contrib.Analyzers does not contain a spanish
analyzer. There is a SpanishStemmer available in the Snowball contrib. You
could also use a spanish hunspell dictionary for stemming[2].

// Simon

[1]
http://ravendb.net/docs/client-api/querying/static-indexes/configuring-index-options
[2] https://github.com/sisve/Lucene.Net.Analysis.Hunspell


On 2012-06-14 16:49, vicente garcia wrote:

Hi to all, this is my first mail to this list :)

I'd like to index spanish content in raven db, I have been searching a
lot, but I don't know how I can do it.

Could someones help me please?

Thanks :)

Re: Spanish analyzer in ravendb

Reply via email to