> My custom SKOSAnalyzer already performs synonym expansion > based on the labels defined in a given SKOS model. But now I > have the problem that real-world thesauri often define > (multi terms) synonyms for mult-term words. Here is an > example that defines the abbreviation "UN" as synonym for > "United Nations" > > <skos:Concept rdf:about="http://www.cs.univie.ac.at/thesaurus/concept/6"> > <skos:prefLabel>United > Nations</skos:prefLabel> > > <skos:altLabel>UN</skos:altLabel> > </skos:Concept> > > At the end the analyzer should add the term UN at the right > position in the index. Taking the example above, a sentence > "I work for the United Nations" should appear in the index > as > > 2: [work: 2-> 6] > 5: [united nations: 15->29] [un: 15->29] > > ...so that a query "I work for the UN" also matches the > document. > > What is the best solution to implement that. With a > TokenFilter I can work through the sentence token by token > (using incrementToken()) and check if there is a synonym > available. How can I analyze token sequences in a given > text? Do I need to implement a custom tokenizer that > recognizes entities based on a given dictionary? > > I am grateful for any suggestions or advice.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory can handle multi-word synonyms. This may help. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org