Re: multi-term synonym expansion

Ahmet Arslan Tue, 06 Jul 2010 06:42:30 -0700

> My custom SKOSAnalyzer already performs synonym expansion
> based on the labels defined in a given SKOS model. But now I
> have the problem that real-world thesauri often define
> (multi terms) synonyms for mult-term words. Here is an
> example that defines the abbreviation "UN" as synonym for
> "United Nations"
> 
> <skos:Concept rdf:about="http://www.cs.univie.ac.at/thesaurus/concept/6";>
>       <skos:prefLabel>United
> Nations</skos:prefLabel>
>      
> <skos:altLabel>UN</skos:altLabel>
>  </skos:Concept>
> 
> At the end the analyzer should add the term UN at the right
> position in the index. Taking the example above, a sentence
> "I work for the United Nations" should appear in the index
> as 
> 
> 2: [work: 2-> 6]
> 5: [united nations: 15->29] [un: 15->29]
> 
> ...so that a query "I work for the UN" also matches the
> document.
> 
> What is the best solution to implement that. With a
> TokenFilter I can work through the sentence token by token
> (using incrementToken()) and check if there is a synonym
> available. How can I analyze token sequences in a given
> text? Do I need to implement a custom tokenizer that
> recognizes entities based on a given dictionary?
> 
> I am grateful for any suggestions or advice.


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 can handle multi-word synonyms. This may help.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: multi-term synonym expansion

Reply via email to