https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=35104

--- Comment #18 from David Cook <[email protected]> ---
(In reply to David Cook from comment #17)
> I'm going to poke around in this a bit more...

The TransformHtmlToMarc doesn't seem to affect it...

If I do $record->as_formatted then I see:

like&#2;minded

If I do $record->as_xml then I see:

&amp;#2;

Looking at
https://metacpan.org/dist/MARC-File-XML/source/lib/MARC/File/XML.pm#L378 there
is an escape function that escapes ampersands and angle brackets. 

In theory, maybe MARC::File::XML should escape any invalid characters using
character references or remove them since they're invalid.

But MARC::File::XML's escaping means it's impossible for us to pre-escape any
invalid characters.

It feels like MARC::File::XML is essentially holding us hostage. We need to
clean our input data (in whatever format) before it reaches MARC::File::XML,
which seems a bit silly, since it's the XML format which has the
restrictions...

That being said... the XML 1.0 spec is pretty forgiving. After review, it's
really just excluding *some* ASCII control characters, Unicode surrogates,
U+FFFE, and U+FFFF. That's a really small number of characters and none of them
are printable characters.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to