[Koha-bugs] [Bug 38416] Failover to MARCXML if cannot roundtrip USMARC when indexing

bugzilla-daemon--- via Koha-bugs Wed, 18 Dec 2024 16:08:41 -0800

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38416


--- Comment #20 from David Cook <[email protected]> ---
(In reply to Andrii Nugged from comment #14)
> - so, on rebuild_elasticsearch.pl it dies with such message:
> 
> UTF-8 "\xC3" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
> line 35.
> 
> - it happens in the line:
> 
> $decoded_usmarc_record = MARC::Record->new_from_usmarc($usmarc_record);

> I am researching why, but I am still in the process.

After reviewing the main branch and v24.11.00, this seems very unlikely.

If you had bad UTF8 data, the MARC::Record object would fail to get created
from the MARCXML within an eval{}. 

To fail at '$decoded_usmarc_record =
MARC::Record->new_from_usmarc($usmarc_record);' with a UTF8 encoding error...
it just doesn't make sense.

I wrote a little script to inject a "\xC3" byte into the UTF-8 record to try to
update a record using Koha APIs with mixed encodings, but something along the
way converted it into the EFBFBD UTF-8 replacement character...

I was more brutal and I tried injecting C3 bytes into the text, but either DBI
or MySQL itself seems to automatically try to do damage control and turns a C3
byte into a C383 UTF-8 byte. 

There might be some sort of configuration of bad bytes out there that can
trigger this error you're having, but I can't find it.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 38416] Failover to MARCXML if cannot roundtrip USMARC when indexing

Reply via email to