[
https://issues.apache.org/jira/browse/SOLR-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209231#comment-13209231
]
Mauro Asprea edited comment on SOLR-1279 at 2/16/12 9:02 AM:
-------------------------------------------------------------
I confirm this is working using the WordDelimiterFilterFactory like Robert said:
{code}
<filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0"
preserveOriginal="1"
catenateAll="1"/>
{code}
The using Solr Admin Analysis page I get the following:
Value: McDonald's
||Indexed Term|
|McDonald's|
|Mc|
|Donald|
|s|
|McDonalds|
One thing: You have to be sure that no previous filters remove the trailing
"'s". In my case I had the StandardFilterFactory which does remove tailing
apostrophes.
was (Author: brutuscat):
I confirm this is working using the WordDelimiterFilterFactory like Robert
said:
{code}
<filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0"
preserveOriginal="1"
catenateAll="1"/>
{code}
The using Solr Admin Analysis page I get the following:
Value: McDonal's
||Indexed Term|
|McDonald's|
|Mc|
|Donald|
|s|
|McDonalds|
One thing: You have to be sure that no previous filters remove the trailing
"'s". In my case I had the StandardFilterFactory which does remove tailing
apostrophes.
> ApostropheTokenizer
> -------------------
>
> Key: SOLR-1279
> URL: https://issues.apache.org/jira/browse/SOLR-1279
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Reporter: Sergey Borisov
> Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: ApostropheTokenizer.zip
>
>
> ApostropheTokenizer creates extra tokens during the analysis stage for the
> fields containing apostrophes. The reason for adding this is to ensure that
> documents that differ only by apostrophe have the same relevancy score.
> For example, if the document contains string "McDonald's", it will be
> tokenized as "McDonald's McDonalds". This way when the search is performed
> against "McDonald's" or "McDonalds" will produce similar score.
> This code handles up to two apostrophes in a token.
> To use this tokenizer add the following line in schema.xml
> <analyzer type="index">
> <filter class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
> ...
> </analyzer>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]