Removing QueryAutoTruncate was the trick, everything looks pertinent now! 2018-04-05 11:09 GMT+02:00 Nicolas Legrand <nicolas.legr...@bulac.fr>:
> For what it's worth, we also use Latin script language and find the > results more relevant without a star, or at least with the queries of 17.05 > :). > > 2018-04-04 13:10 GMT+02:00 Nick Clemens <n...@bywatersolutions.com>: > >> Interesting, yes, the star was added to support auto_truncation and >> enabled by default. For languages using latin scripts we need the star, >> otherwise a search for "cat" will not return results containing "cats" >> >> I am not sure what the path to correcting this is - I think you should >> file a bug report with this info and we can take a deeper look into how we >> are building our searches and what we can do. >> >> On Tue, Apr 3, 2018 at 10:22 AM Nicolas Legrand <nicolas.legr...@bulac.fr> >> wrote: >> >>> Good day devs, >>> >>> Nick spotted these one during last Marseille Hackfest. We made some test >>> with our catalogue on master and find out how to reproduce it, how to break >>> it and how to fix it, though the inner mechanics remains a mystery and we >>> are not quite sure about what the default behaviour should be. >>> >>> We did our test with 中國翻譯 (Chinese Translators Journal) which have two >>> words highly present in our Catalog: China and translation. >>> >>> First, the default Koha behaviour is to add a "*" at the end of the >>> searched word, which lead to 0 results. It produces a query looking like >>> this one: >>> >>> $ curl "http://localhost:9200/koha_robin_biblios/_search?pretty" -d >>> '{"from": 0, "size": 0,"query":{"query_string":{"query": "中國翻譯*"}}}' >>> { >>> "took" : 1, >>> "timed_out" : false, >>> "_shards" : { >>> "total" : 5, >>> "successful" : 5, >>> "skipped" : 0, >>> "failed" : 0 >>> }, >>> "hits" : { >>> "total" : 0, >>> "max_score" : 0.0, >>> "hits" : [ ] >>> } >>> } >>> >>> If we quote 中國翻譯 in Koha, it yields one answer, the right one. It >>> produces a query looking like this one: >>> >>> $ curl "http://bouse02.prive.bulac.fr:9200/koha_robin_biblios/_sear >>> ch?pretty" -d '{"from": 0, "size": 0,"query":{"query_string":{"query": >>> "\"中國翻譯\""}}}' >>> { >>> "took" : 5, >>> "timed_out" : false, >>> "_shards" : { >>> "total" : 5, >>> "successful" : 5, >>> "skipped" : 0, >>> "failed" : 0 >>> }, >>> "hits" : { >>> "total" : 1, >>> "max_score" : 0.0, >>> "hits" : [ ] >>> } >>> } >>> >>> Note that if I write an Elasticsearch query without quotes or star, it >>> yields too much results (9626), the “right” result isn't in the ten first >>> results: >>> >>> $ curl "http://bouse02.prive.bulac.fr:9200/koha_robin_biblios/_sear >>> ch?pretty" -d '{"from": 0, "size": 0,"query":{"query_string":{"query": >>> "中國翻譯"}}}' >>> { >>> "took" : 16, >>> "timed_out" : false, >>> "_shards" : { >>> "total" : 5, >>> "successful" : 5, >>> "skipped" : 0, >>> "failed" : 0 >>> }, >>> "hits" : { >>> "total" : 9626, >>> "max_score" : 0.0, >>> "hits" : [ ] >>> } >>> } >>> >>> >>> I'm not sure what the right behaviour needs to be. We felt adding quotes >>> added a lot of relevance to our results no matter what the language is. >>> What is certain is that adding a star to the search by default doesn't help >>> us. We didn't have the problem with Elasticsearch while playing with it in >>> 17.05. For us, it is a regression. I add the MARC of our test record. >>> >>> What do you think about it? >>> >>> Best regards, >>> >>> -- >>> >>> *Nicolas Legrand* >>> Administration technique et développements du système de gestion de la >>> bibliothèque >>> >>> [image: Logo BULAC] >>> >>> Bibliothèque universitaire >>> des langues et civilisations >>> >>> 65 rue des Grands Moulins >>> <https://maps.google.com/?q=65+rue+des+Grands+Moulins&entry=gmail&source=g> >>> F-75013 PARIS >>> T +33 1 81 69 *18 22* >>> www.bulac.fr >>> _______________________________________________ >>> Koha-devel mailing list >>> Koha-devel@lists.koha-community.org >>> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel >>> website : http://www.koha-community.org/ >>> git : http://git.koha-community.org/ >>> bugs : http://bugs.koha-community.org/ >> >> -- >> Nick Clemens >> Sonic Screwdriver (Development Support) >> ByWater Solutions >> IRC: kidclamp >> > > > > -- > > *Nicolas Legrand* > Administration technique et développements du système de gestion de la > bibliothèque > > [image: Logo BULAC] > > Bibliothèque universitaire > des langues et civilisations > > 65 rue des Grands Moulins > F-75013 PARIS > T +33 1 81 69 *18 22* > www.bulac.fr > -- *Nicolas Legrand* Administration technique et développements du système de gestion de la bibliothèque [image: Logo BULAC] Bibliothèque universitaire des langues et civilisations 65 rue des Grands Moulins F-75013 PARIS T +33 1 81 69 *18 22* www.bulac.fr
_______________________________________________ Koha-devel mailing list Koha-devel@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/