[Koha-bugs] [Bug 17842] Broken diacritics on records exported as MARC from cart

bugzilla-daemon Mon, 25 May 2020 16:28:00 -0700

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=17842


David Cook <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #19 from David Cook <[email protected]> ---
Given my bad experience the other day trying to import records converted from
GB2312 to UTF8 into Koha, I'm extra interested by this. Maybe it's a related
topic.

At a glance, those sample records look fine both in Latin1 and UTF8. 

MarcEdit can convert the ISO MARC into its MRK format, but I'm failing to
convert it from ISO MARC to MARCXML. 

When I try to read your sample records as UTF-8 using MARC::File::USMARC, I see
the following error:

UTF-8 "\xFC" does not map to Unicode

Using "xxd cart.iso2709", I see that the "fc" byte is the ü in über and für.
Ah, and FC is ü in Latin-1 encoding whereas in UTF-8 it's C3 BC. 

So it sounds like Koha is exporting as Latin-1 but trying to import as UTF-8
and that's where it's falling over? 

Needs more investigating, but that's the problem with your sample records I'd
say.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 17842] Broken diacritics on records exported as MARC from cart

Reply via email to