https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25265
--- Comment #22 from Nick Clemens <[email protected]> --- (In reply to Marcel de Rooy from comment #19) > General > Module changes, no tests ? I will work on something here > We are now fetching MARC records in ModZebra for Elastic. We were before these patches > that if we use Elastic we should be leaving ModZebra as soon as we can ;) Agreed, but that is a bigger job, this is addressing a bug > The operation of sending biblionumbers and biblio records in two lists makes > me a bit suspicious. There is no check anymore that these lists are in sync. > If there would be a gap somehow, we would be indexing records on the wrong > id? Might be just theoretical but looks suboptimal. As you note later, we only pass a record from ModAuthority, otherwise we are building the lists > It seems that ModZebra is still inserting records into zebraqueue when we > use Elastic. Why? Should we skip and cleanup ? Z3950 was the original main reason, we have the z3950responder now, but keeping zebra in sync is not a bad thing. Eventually yes, but for now the dafety net has been useful. Beyond the scope of this bug > The for loop in ModZebra seems to be suboptimal? In a loop you are checking > if a record count is zero and if so, insert one record. > Why dont you do one DELETE statement and one INSERT ? delete from > zebra_queue where biblio_auth_number IN ($str) etc. > Actually thinking about that: if you do one db call per record, why dont you > add the loop in batch mod without touching ModZebra ? > [Wrote this earlier: see my conclusion too.] That is essentially what we have now, a loop in the batch operations, calling ModZebra for each record. This could be improved, but I don't think this adds complexity so much as highlighting what we do > > Koha/SearchEngine/Elasticsearch/Indexer.pm dates from 2013 but still > includes: > $indexer->update_index_background does just refer to update_index: TODO > implement in the future - I don't know the best way of doing this yet. > Same $indexer->delete_index_background # TODO: Should be made async This is work we should do, we don't have sponsors or the infrastructure yet, hopefully the task queue can help here > > C4/AuthoritiesMarc.pm: ModZebra( $authid, 'specialUpdate', > 'authorityserver', $record ); > Is this the only use of the record parameter in ModZebra ? Apparently. For > Zebra we don't need it, for Elastic we now do. This code is weird, I agree, but beyond the scope of this bug > This part in ModZebra is really troublesome too. We do not even check if we > got a biblio or an authority record ! > unless ($record) { > $record = GetMarcBiblio({ > biblionumber => $biblionumber, > embed_items => 1 }); > } This code is beyond the scope here, we are in C4/Biblio.pm - we really should only deal with biblio, and we do, except for the call above > Conclusion: > Looks like we can still improve a lot of the Elastic code. Yes we definitely can > Adding the > current code in the old ModZebra is not the way to go imo. Rewriting and moving this will be a bigger work and this is a bug that affects stables > Move that code to > SearchEngine by just adding a call. > Since we only have one use of the record > parameter in ModZebra, I would be inclined to remove it for consistency. I > understand that it is useful as optimization though. Please file a new bug for this > Adding a loop somewhere else around ModZebra is probably the pragmatic way > too. We never call it for lists. Perhaps adding an iteration routine in a > module ? I don't see why looping in one place or the other matters. In fact, look at this comment in the code # true ModZebra commented until indexdata fixes zebraDB crashes (it seems they occur on multiple updates # at the same time > Changing status. Your comments are well appreciated Marcel, and they are true. We have technical debt here, and we need to rethink some of our methods. You have taken a very broad look at the code, and identified issues that affect many areas. Currently though, we have a bug that is affecting users of ES, and we need to support these users so we can continue to test and improve the ES code. I attempted to take a path here that makes smaller changes, the basic idea of calling for index updates by batch rather than individually I think is sound. I will work on tests and take a look at the Zebra loop and would ask you to reconsider some of the larger points as future enhancements -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
