[jira] [Issue Comment Edited] (SOLR-1279) ApostropheTokenizer

Mauro Asprea (Issue Comment Edited) (JIRA) Thu, 16 Feb 2012 01:03:28 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209231#comment-13209231
 ]


Mauro Asprea edited comment on SOLR-1279 at 2/16/12 9:02 AM:
-------------------------------------------------------------

I confirm this is working using the WordDelimiterFilterFactory like Robert said:

{code}
<filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0"  
preserveOriginal="1"
catenateAll="1"/>      
{code}

Then using Solr Admin Analysis page I get the following:
Value: McDonald's

||Indexed Term|
|McDonald's|
|Mc|
|Donald|
|s|
|McDonalds|

One thing: You have to be sure that no previous filters remove the trailing 
"'s". In my case I had the StandardFilterFactory which does remove tailing 
apostrophes.
                
      was (Author: brutuscat):
    I confirm this is working using the WordDelimiterFilterFactory like Robert 
said:

{code}
<filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0"  
preserveOriginal="1"
catenateAll="1"/>      
{code}

The using Solr Admin Analysis page I get the following:
Value: McDonald's

||Indexed Term|
|McDonald's|
|Mc|
|Donald|
|s|
|McDonalds|

One thing: You have to be sure that no previous filters remove the trailing 
"'s". In my case I had the StandardFilterFactory which does remove tailing 
apostrophes.
                  
> ApostropheTokenizer
> -------------------
>
>                 Key: SOLR-1279
>                 URL: https://issues.apache.org/jira/browse/SOLR-1279
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Sergey Borisov
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: ApostropheTokenizer.zip
>
>
> ApostropheTokenizer creates extra tokens during the analysis stage for the 
> fields containing apostrophes. The reason for adding this is to ensure that 
> documents that differ only by apostrophe have the same relevancy score. 
> For example, if the document contains string "McDonald's", it will be 
> tokenized as "McDonald's McDonalds". This way when the search is performed 
> against "McDonald's" or "McDonalds" will produce similar score.
> This code handles up to two apostrophes in a token.
> To use this tokenizer add the following line in schema.xml
> <analyzer type="index">
>       <filter class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
> ...
> </analyzer>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Edited] (SOLR-1279) ApostropheTokenizer

Reply via email to