Re: Query with exact number of tokens

2018-09-24 Thread Sergio García Maroto
Thanks all for your ideas. It was very useful information. On Fri, 21 Sep 2018 at 19:04, Jan Høydahl wrote: > I have made a FieldType specially for this > https://github.com/cominvent/exactmatch/ < > https://github.com/cominvent/exactmatch/> > > -- > Jan Høydahl, search solution architect >

Re: Query with exact number of tokens

2018-09-21 Thread Jan Høydahl
I have made a FieldType specially for this https://github.com/cominvent/exactmatch/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 21. sep. 2018 kl. 18:14 skrev Steve Rowe : > > Link correction - wrong fragment identifier

Re: Query with exact number of tokens

2018-09-21 Thread Steve Rowe
Link correction - wrong fragment identifier in ref #5 - should be: [5] https://lucene.apache.org/solr/guide/7_4/other-parsers.html#function-range-query-parser -- Steve www.lucidworks.com > On Sep 21, 2018, at 12:04 PM, Steve Rowe wrote: > > Hi Sergio, > > Chris “Hoss” Hostetter has a

Re: Query with exact number of tokens

2018-09-21 Thread Steve Rowe
Hi Sergio, Chris “Hoss” Hostetter has a solution to this kind of problem here: https://lists.apache.org/thread.html/6b0f0cb864aa55f0a9eadfd92d27d374ab8deb16e8131ed2b7234463@%3Csolr-user.lucene.apache.org%3E . See also the suggestions in comments on SOLR-12673[1], which include a version of

Re: Query with exact number of tokens

2018-09-21 Thread Walter Underwood
How about sorting the tokens in alphabetical order both for indexing and query, then using the sentinel trick. Source text: CENTURY BANCORP, INC Solr text: SENTINEL bancorp century inc SENTINEL wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep

Re: Query with exact number of tokens

2018-09-21 Thread Alexandre Rafalovitch
Hmm, I was suggesting to put TokenCountingFilter at the end of both indexing and query chains for the same (e.g. name_count) field. Then, the search would be something like (warning, major syntax errors): .../select? queryname=CENTURY BANCORP, INC& q=*:* fq={!eDisMax v=queryname mm=100%}name&

Re: Query with exact number of tokens

2018-09-21 Thread Erick Erickson
A variant on Alexandre's approach is: at index time, count the tokens that will be produced yourself (this may be a little tricky, you shouldn't have WordDelimiterFilterFactory in your analysis for instance). Put the number of tokens in a separate field At query time, you'd search

Re: Query with exact number of tokens

2018-09-21 Thread Alexandre Rafalovitch
I think you can match everything in the query to the field using either 1) disMax/eDisMax with mm=100% https://lucene.apache.org/solr/guide/7_4/the-dismax-query-parser.html#mm-minimum-should-match-parameter 2) Complex Phrase Query Parser with inOrder=false:

Re: Query with exact number of tokens

2018-09-21 Thread Michael Kuhlmann
Hi Sergio, alas that's not possible that way. If you search for CENTURY BANCORP, INC., then Solr will be totally happy to find all these terms in "NEW CENTURY BANCORP, INC." and return it with a high score. But you can prepare your data at index time. Make it a multivalued field of type string

Re: Query with exact number of tokens

2018-09-21 Thread Andrea Gazzarini
Oops, sorry...too much rush in reading, I didn't read the second part. Please forget my answer ;) Andrea On 21/09/18 15:52, Andrea Gazzarini wrote: Hi Sergio, assuming that you don't want to disable tokenisation (otherwise you can define the indexed field as a string and search it as a

Re: Query with exact number of tokens

2018-09-21 Thread Andrea Gazzarini
Hi Sergio, assuming that you don't want to disable tokenisation (otherwise you can define the indexed field as a string and search it as a whole), in "Relevant Search" the authors describe a cool approach using the so called "Sentinel Tokens", which are symbolic tokens representing the

Query with exact number of tokens

2018-09-21 Thread marotosg
Hi, I have to search for company names where my first requirement is to find only exact matches on the company name. For instance if I search for "CENTURY BANCORP, INC." I shouldn't find "NEW CENTURY BANCORP, INC." because the result company has the extra keyword "NEW". I can't use exact match