[Koha-bugs] [Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon Thu, 12 Apr 2018 13:10:40 -0700

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969


--- Comment #14 from David Gustafsson <[email protected]> ---
Ok! I actually think that would be a bad idea, mainly for the following
reasons:

1) Elasticsearch uses ranking function called Okapi BM25 (used to be Term
Frequency/Inverse Document Frequency (TF/IDF), which similar but simpler to
understand). Two of the parameters Okapi BM25 uses to calculate the relevancy
score (per field) are average field length and inverse document frequency
(IDF). If you put all values in one field, average field length and inverse
document frequency will averaged out based on all fields, effectively crippling
the algorithm rendering it unable to calculate relevancy properly.

2) You will also not be able to use per field boosting, unless you add boosted
fields to "fields" as well, but then you might as well skip the "_all_*" fields
and pass along the full list of fields instead.

3) The index will be about 3x as big, increasing memory usage. This might not a
huge issue, but could be for us for example as we have several million biblios
and already quite a large index already.

4) To utilize the full power of Elasticsearch one would want to be able to use
different analyzers/normalizers and other useful mapping settings on a per
field basis, and nice query string query options like "quote_field_suffix".
With everyting in one field, all data will be indexed using the same mapping
settings, and features like quote_field_suffix will not work.

I can actually see no benefits with using "all_*" fields, and no real downside
by instead generating a proper "fields" containing all searchable fields. I
begun working on a patch today (one of the reasons was that we need per field
boosting), and it's actually not a very complicated change. Might not be ready
tomorrow, but at least some time in the beginning of next week.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

Reply via email to