https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38101
Bug ID: 38101
Summary: ES skips records with huge fields
Change sponsored?: ---
Product: Koha
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P5 - low
Component: Searching - Elasticsearch
Assignee: [email protected]
Reporter: [email protected]
QA Contact: [email protected]
I saw a case in the wild, where the staff member copied and pasted some legal
text from a PDF into a 500 field. Then the record was not able to be found
using ES.
The reason is fairly simple: ES has a max size it will accept for a phrase
index.
To reproduce:
1. Have KTD running with ES:
$ ktd --proxy --es7 up -d
2. Perform a search
3. Pick the first result for edition
4. Find a cool Wiki page with lots of paragraphs
5. Copy all of the paragraphs and put them on a 500$a field for the record.
6. Repeat 2
=> FAIL: The record is not found
7. Reindex manually:
$ ktd --shell
k$ perl misc/search_tools/rebuild_elasticsearch.pl --biblios --where
"biblionumber=3" -v -v
=> FAIL: You get something like:
```
[22229] Committing final records...
One or more ElasticSearch errors occurred when indexing documents at
/kohadevbox/koha/Koha/SearchEngine/Elasticsearch/Indexer.pm line 148.
[22229] There were errors during indexing
Record #3 Document contains at least one immense term in field="note.raw"
(whose UTF8 encoding is longer than the max length 32766), all of which were
skipped. Please correct the analyzer to not produce such terms. The prefix of
the first immense term is: '[10, 109, 117, 115, 116, 97, 102, 97, 32, 102, 117,
101, 32, 101, 108, 32, 115, 101, 103, 117, 110, 100, 111, 32, 104, 105, 106,
111, 32, 100]...', original message: bytes can be at most 32766 in length; got
32771 (illegal_argument_exception) : max_bytes_length_exceeded_exception (bytes
can be at most 32766 in length; got 32771)
[22229] Total 1 records indexed
```
--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/