https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38416

--- Comment #15 from David Cook <[email protected]> ---
(In reply to Andrii Nugged from comment #14)
> After this patch, it DIES inside this sub for something like 1/5 of my
> records,

Thanks for reporting this. Looking again at my code, I can see how that could
be a risk.

> UTF-8 "\xC3" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
> line 35.
> 
> - it happens in the line:
> 
> $decoded_usmarc_record = MARC::Record->new_from_usmarc($usmarc_record);
> 
> 
> Note 1: I used the 24.11.xx branches, just with Elasticsearch.pm code
> reverted (removed), and it works on old code properly: it does reindex and
> has all records. But with this patch, I have ~2500 records lost from the
> index.

Are you able to see these records in your Koha search results? If they're
failing in the indexing code, surely they should be failing in the search code
too?

> Note 2: we have a lot of non-ASCII symbols in Finnish language texts and
> Cyrillic texts.

I haven't seen any problems in my non-English libraries, but perhaps they
haven't triggered much indexing recently. Looking at the above, it seems like
you might have some data problems? What do you have for position 09 in the
leader? I do have a vague memory that there might be some Koha code somewhere
for forcing UTF-8 on records even when the MARC records themselves aren't
marked as UTF-8...

> I am researching why, but I am still in the process.

I'll create a new bug report and add a patch with an eval or try/catch so that
a bad record doesn't cause a larger crash, but I am curious about the
underlying cause too.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to