https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=42877
Bug ID: 42877
Summary: Memoize _get_marc_mapping_rules to speed up record
indexing
Initiative type: ---
Sponsorship ---
status:
Product: Koha
Version: Main
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P5 - low
Component: Searching - Elasticsearch
Assignee: [email protected]
Reporter: [email protected]
QA Contact: [email protected]
Target Milestone: ---
Koha::SearchEngine::Elasticsearch::_get_marc_mapping_rules() rebuilds the whole
mapping ruleset (regex-parsing every search_marc_map entry) on every call to
marc_records_to_documents, and is not memoized.
For a bulk reindex (large batches) this is amortized over the batch. But for
incremental indexing -- every AddBiblio/ModBiblio enqueues a background job
that
indexes a single record -- the full rebuild runs for every record.
Profiling on 435 records (KTD, Elasticsearch):
- _get_marc_mapping_rules alone: ~10.7 ms/call
- per-record mapping otherwise: ~3.6 ms
So rule rebuilding is roughly 75% of the CPU for a single-record index job.
The parsed rules are deterministic per (marcflavour, index). Caching them, the
same way _foreach_mapping already uses a 'state' cache, would make
single-record
indexing roughly 3.5x faster and remove the cost from every code path.
Test plan:
1. prove t/db_dependent/Koha/SearchEngine/Elasticsearch.t
2. Reindex a single biblio (or edit one) and confirm it still indexes
correctly.
3. Benchmark single-record marc_records_to_documents before/after.
--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/