How does the synonym filter work internally? I configured it with a very large synonym file (90,000 lines) running Solr in glassfish and it started fine, but when I queried, it hung and ran out of memory. The file wasn' big enough to exhaust the heap....I never was able to get it to run smoothly.
On Tue, 6 Jul 2010 06:40:54 -0700 (PDT), Ahmet Arslan <iori...@yahoo.com> wrote: >> My custom SKOSAnalyzer already performs synonym expansion >> based on the labels defined in a given SKOS model. But now I >> have the problem that real-world thesauri often define >> (multi terms) synonyms for mult-term words. Here is an >> example that defines the abbreviation "UN" as synonym for >> "United Nations" >> >> <skos:Concept rdf:about="http://www.cs.univie.ac.at/thesaurus/concept/6"> >> <skos:prefLabel>United >> Nations</skos:prefLabel> >> >> <skos:altLabel>UN</skos:altLabel> >> </skos:Concept> >> >> At the end the analyzer should add the term UN at the right >> position in the index. Taking the example above, a sentence >> "I work for the United Nations" should appear in the index >> as >> >> 2: [work: 2-> 6] >> 5: [united nations: 15->29] [un: 15->29] >> >> ...so that a query "I work for the UN" also matches the >> document. >> >> What is the best solution to implement that. With a >> TokenFilter I can work through the sentence token by token >> (using incrementToken()) and check if there is a synonym >> available. How can I analyze token sequences in a given >> text? Do I need to implement a custom tokenizer that >> recognizes entities based on a given dictionary? >> >> I am grateful for any suggestions or advice. > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory > can handle multi-word synonyms. This may help. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org