[ https://issues.apache.org/jira/browse/SOLR-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018207#comment-15018207 ]
Ted Sullivan commented on SOLR-7136: ------------------------------------ Thanks for this submission [~kvakhsho...@gmail.com]! I think that this really helps to scale the autophrasing solution. Also the improvement in dealing with PositionLength is a big plus, as are the improvements in the query parser. Great work, thanks. I have seen some reports on the github version of my code about memory leaks. Have you looked at that? I will take your patch and try to do some A/B comparisons on this to see if the new FSM implementation (hopefully) removes that problem too. But in general, have you done any performance/scaling tests on your version of the autofilter? Obviously, this goes along with the production-readiness that your new implementation makes possible. Thanks again for submitting this patch. As to complementarity with SOLR-4381 - I would agree - nice to hear that the two solutions play nicely with each other :) IMO this is an important problem that needs a committed solution. If we give Solr users more than one way to "skin the cat" - the better the chance that they will find a solution for their own problem set. As to the acronym 'DC' - this is also ambiguous because it also stands for "District of Columbia" - certainly domain context will clear this up some but not if you have a global search problem like Google or Bing. I'll look into this problem too. > Add an AutoPhrasing TokenFilter > ------------------------------- > > Key: SOLR-7136 > URL: https://issues.apache.org/jira/browse/SOLR-7136 > Project: Solr > Issue Type: New Feature > Reporter: Ted Sullivan > Attachments: AutoPhaseFiniteStateDiagram.pdf, SOLR-7136.patch, > SOLR-7136.patch, SOLR-7136.patch, SOLR-7136.patch > > > Adds an 'autophrasing' token filter which is designed to enable noun phrases > that represent a single entity to be tokenized in a singular fashion. Adds > support for ManagedResources and Query parser auto-phrasing support given > LUCENE-2605. > The rationale for this Token Filter and its use in solving the long standing > multi-term synonym problem in Lucene Solr has been documented online. > http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/ > https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org