improve interaction of synonymsfilterfactory with analysis chain
----------------------------------------------------------------

                 Key: SOLR-2648
                 URL: https://issues.apache.org/jira/browse/SOLR-2648
             Project: Solr
          Issue Type: Improvement
          Components: Schema and Analysis
    Affects Versions: 3.4, 4.0
            Reporter: Robert Muir


Spinoff of LUCENE-3233 (there is a TODO here), this was also mentioned by Otis 
on the mailing list: 
http://www.lucidimagination.com/search/document/8e91f858314562e/automatic_synonyms_for_multiple_variations_of_a_word#76c3d09f95f7a58f

As of LUCENE-3233, the builder for the synonyms structure uses an Analyzer 
behind the scenes to actually tokenize the synonyms in your synonyms file.
Currently the solr factory uses a WhitespaceTokenizer, unless you supply the 
tokenizerchain parameter, which lets you specify a tokenizer.

If there was some way to instead specify a chain to this factory (e.g. 
charfilters, tokenizer, tokenfilter such as stemmers) versus just a 
tokenizerfactory, 
it would be a lot more flexible (e.g. it would stem your synonyms for you), and 
would solve this use case.

Personally I think it would be most ideal if this just automatically work, e.g. 
if you have a chain of A, B, SynonymsFilter, C, D: then in my opinion the 
synonyms
should be analyzed with an analysis chain of A, B. This way the injected 
synonyms are processed as if they were in the tokenstream to begin with.

Note: there are some limitations here to what the chain can do, e.g. you cant 
be putting WDF before synonyms or other things that muck with positions, and 
you cant
have a synonym that analyzes to nothing at all, but the parser checks for all 
these conditions and throws a syntax error so it would be clear to the user 
that 
they put the synonymsfilter in the "wrong place" in their chain.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to