Hello Guilers! RhodiumToad found an error in sxml where it would not properly parse CDATA: > would be converted to > inside CDATA blocks. This is probably due to some wrong reading of the XML spec:
"Within a CDATA section, only the CDEnd string is recognized as markup, so
that left angle brackets and ampersands may occur in their literal form; they
need not (and cannot) be escaped using ' < ' and ' & '.".
Notice that it mentions that only CDEnd is recognized, but omitts > in the
enumeration of things that need-not-and-cannot be escaped.
No other XML libraries behave this way. Take for example python's Etree:
Python 2.7.17 (default, Dec 23 2019, 21:25:33)
>>> import xml.etree.ElementTree as ET
>>> root = ET.fromstring("<e><![CDATA[>]]></e>")
>>> root.text
'>'
The same thing with the un-patched (sxml ssax) (or rather (sxml simple)): looks
different:
(xml->sxml "<e><![CDATA[>]]></e>")
;; => (*TOP* (e ">"))
The question is whether this patch should be sent upstream. Since there has
been very little activity there, I suspect it is a lost cause.
Failing tests have been looked through, verified and fixed. No unexpected
errors were encountered. All SXML tests pass after this patch.
Best regards
Linus Björnstam
0001-module-sxml-upstream-SSAX.scm-Fix-improper-handling-.patch
Description: Binary data
