[jira] Created: (SOLR-1279) ApostropheTokenizer

Sergey Borisov (JIRA) Tue, 14 Jul 2009 11:27:38 -0700

ApostropheTokenizer
-------------------

                 Key: SOLR-1279
                 URL: https://issues.apache.org/jira/browse/SOLR-1279
             Project: Solr
          Issue Type: New Feature
          Components: Analysis
            Reporter: Sergey Borisov
            Priority: Minor



ApostropheTokenizer creates extra tokens during the analysis stage for the 
fields containing apostrophes. The reason for adding this is to ensure that 
documents that differ only by apostrophe have the same relevancy score. 

For example, if the document contains string "McDonald's", it will be tokenized 
as "McDonald's McDonalds". This way when the search is performed against 
"McDonald's" or "McDonalds" will produce similar score.

This code handles up to two apostrophes in a token.

To use this tokenizer add the following line in schema.xml

<analyzer type="index">
      <filter class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
...
</analyzer>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1279) ApostropheTokenizer

Reply via email to