https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=42880
--- Comment #1 from Martin Renvoize (ashimema) <[email protected]> --- Created attachment 200647 --> https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=200647&action=edit Bug 42880: Avoid the USMARC round-trip when building ES documents For every record, marc_records_to_documents serialized it to USMARC and then re-parsed it with MARC::Record->new_from_usmarc, purely to "prove it will round-trip" before storing marc_data. Profiling shows that re-decode is ~20-25% of the per-record mapping CPU. The round-trip only needs to fire for records that cannot be represented in ISO 2709. Those cases are cheaply detectable from the record itself: - the total length overflows the 5-digit leader length (> 99999), - a single field overflows the 4-digit directory length (> 9999), or - field data contains raw ISO 2709 separators (0x1d/0x1e/0x1f). as_usmarc() carps only about the first. We now check all three and only run the full round-trip decode when one is hit; on the happy path it is skipped. The separator check makes the skip provably safe rather than relying on MARC::Record's tolerance of stray separators. Those characters are invalid in XML and are stripped on save by bug 35104, so in practice they never reach indexing -- this bug depends on 35104 and the check is belt-and-suspenders. The MARCXML fallback for non-representable records is unchanged. Test plan: 1. prove t/db_dependent/Koha/SearchEngine/Elasticsearch.t 2. Confirm a normal record is stored as base64ISO2709 and round-trips. 3. Confirm an over-long / over-long-field record falls back to MARCXML. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
