https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38101
--- Comment #6 from Thomas Klausner <[email protected]> --- I have now (finally..sorry) looked at the mappings as used in ktd, and it seems that sub-fields of `notes` (where `500` is index to per default) is indeed set up as `type: keyword`: start ktd, then ktd --shell: ``` kohadev-koha@kohadevbox:koha(main)$ curl -s 'http://es:9200/koha_kohadev_biblios/_mapping/field/note?pretty'' { "koha_kohadev_biblios" : { "mappings" : { "note" : { "full_name" : "note", "mapping" : { "note" : { "type" : "text", "fields" : { "ci_raw" : { "type" : "keyword", "normalizer" : "icu_folding_normalizer" }, "phrase" : { "type" : "text", "analyzer" : "analyzer_phrase" }, "raw" : { "type" : "keyword", "normalizer" : "nfkc_cf_normalizer" } }, "analyzer" : "analyzer_standard" } } } } } } ``` Here you see that fields "raw" and "ci_raw" are of type "keyword". Now, to test if this is indeed the problem we have to fiddle with the ElasticSearch mappings, which is not very easy (because the web interface does not have any effect on the actual mappings, which are stored in `admin/searchengine/elasticsearch/mappings.yaml`. BUT we actually don't care that much about the mappings (i.e. which MARC21 fields goes into which search field). We care about the definition of the "note" field, which has no 'type', so it uses the default type, which we find in `admin/searchengine/elasticsearch/field_config.yaml`: ``` default: type: text analyzer: analyzer_standard search_analyzer: analyzer_standard fields: phrase: type: text analyzer: analyzer_phrase search_analyzer: analyzer_phrase raw: type: keyword normalizer: nfkc_cf_normalizer ci_raw: type: keyword normalizer: icu_folding_normalizer ``` Because I'm currently just exploring, I just deleted `raw` and `ci_raw`, but (spoiler alert) this wasn't enough, because the `analyzer_phrase` has the same problem. So I remove all the subfields from "default", so we only have ``` default: type: text analyzer: analyzer_standard search_analyzer: analyzer_standard ``` Now I can recreate the ES index: kohadev-koha@kohadevbox:koha(main)$ perl misc/search_tools/rebuild_elasticsearch.pl -r And Re-Index my test entry (where I added ~40k text to 500): perl misc/search_tools/rebuild_elasticsearch.pl --biblios --bn 284 -v -v And it works!! And I can find the book when I search for some of the text I entered (even if the text is at the end of the 40k). BUT (a very big BUT): This is NOT the proper solution, just a prove that the problem lies in the usage of `keyword` and/or `analyzer_phrase` (where `analyzer_phrase` is defined in `admin/searchengine/elasticsearch/index_config.yaml` and also uses `keyword`) One thing we could (easily) do is to use `ignore_above` for type=keyword (which would behave similar to your patch, in that it removes too-long text): raw: type: keyword normalizer: nfkc_cf_normalizer ignore_above: 20000 ci_raw: type: keyword normalizer: icu_folding_normalizer ignore_above: 20000 But this does not work for `analyzer_phrase` :-( I guess the correct (but very hard) solution would be to figure out why and where we need those subfields (esp. "phrase", but also "raw" and "ci_raw") and decide if we can use ignore_above for "raw" and "ci_raw". And figure out a fix for "phrase". Or, much easier: we define a new search type "long_text" which does not include all those subfields (and therefor will not support a phrase search). Then you can change the search_mappings for "note" on your instance from "default" to "long_text" and everything should work. Or we might even decide that "note" should be a "long_text" per default. Unfortunantley ElasticSearch is a complex beast, and the Koha ES implementation has a lot of improvement opportunities (let's call it that...) -- You are receiving this mail because: You are watching all bug changes. You are the assignee for the bug. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
