[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818593#comment-13818593
 ] 

Furkan KAMACI commented on SOLR-5332:
-------------------------------------

I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a 
patch to SOLR-5152. I want to make clear something about the problem that is 
pointed at this issue. The schema that is described at here: 
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
 uses LowerCaseFilterFactory before EdgeNGramFilterFactory. There is an 
explanation about it: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
 and says that: "Creates tokens by lowercasing all letters and dropping 
non-letters." So non-letters will be dropped "before" tokens are retrieved by 
EdgeNGramFilterFactory. 

My patch preserves original token if preserveOriginal is set to true and token 
length is less than minGramSize or greater than maxGramSize.

> Add "preserve original" setting to the EdgeNGramFilterFactory
> -------------------------------------------------------------
>
>                 Key: SOLR-5332
>                 URL: https://issues.apache.org/jira/browse/SOLR-5332
>             Project: Solr
>          Issue Type: Wish
>            Reporter: Alexander S.
>
> Hi, as described here: 
> http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
>  the problem is in that if you have these 2 strings to index:
> 1. facebook.com/someuser.1
> 2. facebook.com/someveryandverylongusername
> and the edge ngram filter factory with min and max gram size settings 2 and 
> 25, search requests for these urls will fail.
> But search requests for:
> 1. facebook.com/someuser
> 2. facebook.com/someveryandverylonguserna
> will work properly.
> It's because first url has "1" at the end, which is lover than the allowed 
> min gram size. In the second url the user name is longer than the max gram 
> size (27 characters).
> Would be good to have a "preserve original" option, that will add the 
> original string to the index if it does not fit the allowed gram size, so 
> that "1" and "someveryandverylongusername" tokens will also be added to the 
> index.
> Best,
> Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to