Hi,
Gertjan Klein schrieb am 02.05.24 um 17:52:
Op 25-04-2024 om 16:58 schreef Stefan Behnel:
I'm trying to write a conversion program that
outputs XML[2]. It must match the output of an existing program.
Semantically it already does, but I'd like it to match the way CDATA is
handled. To this end, I'd like to allow "wrapped" CDATA. The CDATA class
currently disallows this: it checks for the presence of ']]>', and raises
if found.
The exception probably comes from a time where libxml2 didn't handle this
itself.
I added a parameter to turn off this check. I expected to need to do the
escaping myself, but it seems lxml handles this just fine out of the box.
For example, this tester code:
from lxml import etree
from lxml.etree import CDATA
def main():
root = etree.Element("dummy")
txt = '<root><![CDATA[Something]]></root>'
root.text = CDATA(txt, False)
Such a flag would need to be a keyword-only argument to make this readable.
It's entirely unclear what the "False" refers to, unless you know the call
signature by heart.
out = etree.tostring(root).decode()
print(out)
if__name__ == '__main__':
main()
...prints this:
<dummy><![CDATA[<root><![CDATA[Something]]]]><![CDATA[></root>]]></dummy>
Looks good to me. According to the XML spec (both 1.0 and 1.1), "CDATA
sections cannot nest":
https://www.w3.org/TR/REC-xml/#sec-cdata-sect
But splitting the CDATA section makes perfect sense. This does not even
need an option, we can just remove the check and add a test for it.
Do you want to propose a PR?
The Python "xml.etree.ElementTree" package can also parse this correctly,
but escapes this on output since it doesn't support CDATA sections
directly. Thus, it seems best to add the test in "test_etree.py" rather
than "test_elementtree.py" since the behaviour of both differ here.
Stefan
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com