[jira] [Commented] (SOLR-7136) Add an AutoPhrasing TokenFilter

Ted Sullivan (JIRA) Mon, 08 Jun 2015 07:43:21 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577267#comment-14577267
 ]


Ted Sullivan commented on SOLR-7136:
------------------------------------

Good points Marcus - yes, StringBuffer is obsolete - my bad - old coding habits 
die hard as it were :( As to the IDF issue, that also would need to be looked 
at - thanks for pointing that out - and I also agree on your comment about 
String.replaceAll - but since this is working on queries which are typically 
very small compared to documents, I didn't think it would hurt that much but 
this is probably erroneous thinking when considering load. This QParser as 
discussed below probably needs an overhaul at this point.

on Edismax - you can set the defType in the AutophrasingQParser plugin to 
edismax (it defaults to the lucene parser) but that said, I have noticed some 
issues with it - It messes up on simple things too and it really needs to be 
rethought somewhat.  One change that I will post soon is to enable it to use 
different Tokenizer implementations - the initial patch uses 
WhitespaceTokenizer which is hard coded. I think that it should use 
StandardTokenizer as a default and then allow other impls to be switched in via 
configuration.

> Add an AutoPhrasing TokenFilter
> -------------------------------
>
>                 Key: SOLR-7136
>                 URL: https://issues.apache.org/jira/browse/SOLR-7136
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ted Sullivan
>         Attachments: SOLR-7136.patch, SOLR-7136.patch, SOLR-7136.patch
>
>
> Adds an 'autophrasing' token filter which is designed to enable noun phrases 
> that represent a single entity to be tokenized in a singular fashion. Adds 
> support for ManagedResources and Query parser auto-phrasing support given 
> LUCENE-2605.
> The rationale for this Token Filter and its use in solving the long standing 
> multi-term synonym problem in Lucene Solr has been documented online. 
> http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7136) Add an AutoPhrasing TokenFilter

Reply via email to