[ 
https://issues.apache.org/jira/browse/SOLR-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562427#comment-14562427
 ] 

Markus Jelsma commented on SOLR-7136:
-------------------------------------

Hi Ted - this is another interesting approach to the typical problem. I was 
thinking about the repercussions your token filter has on IDF values. Surely, 
phrases will get a inflated score because they become much rarer than their 
constituent terms, which seems like a good thing. I do have a problem with the 
query parser, it won't work for multi language environments, and it doesn't 
interact with edismax, which is presumably the de facto parser for free text 
input.

Also, your latest patch uses a StringBuffer in the token filter, i believe you 
should rely on StringBuilder instead, you don't need thread-safety at that 
point. Another thing is the usage of String.replaceAll(String, String) in the 
parser, isn't that going to eat cycles we should spare?

> Add an AutoPhrasing TokenFilter
> -------------------------------
>
>                 Key: SOLR-7136
>                 URL: https://issues.apache.org/jira/browse/SOLR-7136
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ted Sullivan
>         Attachments: SOLR-7136.patch, SOLR-7136.patch, SOLR-7136.patch
>
>
> Adds an 'autophrasing' token filter which is designed to enable noun phrases 
> that represent a single entity to be tokenized in a singular fashion. Adds 
> support for ManagedResources and Query parser auto-phrasing support given 
> LUCENE-2605.
> The rationale for this Token Filter and its use in solving the long standing 
> multi-term synonym problem in Lucene Solr has been documented online. 
> http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to