Hi,

Gertjan Klein schrieb am 02.05.24 um 17:52:
Op 25-04-2024 om 16:58 schreef Stefan Behnel:
I'm trying to write a conversion program that outputs XML[2]. It must match the output of an existing program. Semantically it already does, but I'd like it to match the way CDATA is handled. To this end, I'd like to allow "wrapped" CDATA. The CDATA class currently disallows this: it checks for the presence of ']]>', and raises if found.

The exception probably comes from a time where libxml2 didn't handle this itself.

I added a parameter to turn off this check. I expected to need to do the escaping myself, but it seems lxml handles this just fine out of the box. For example, this tester code:

from lxml import etree
from lxml.etree import CDATA
def main():
     root = etree.Element("dummy")
     txt = '<root><![CDATA[Something]]></root>'
     root.text = CDATA(txt, False)

Such a flag would need to be a keyword-only argument to make this readable. It's entirely unclear what the "False" refers to, unless you know the call signature by heart.

     out = etree.tostring(root).decode()
     print(out)
if__name__ == '__main__':
     main()

...prints this:

<dummy><![CDATA[<root><![CDATA[Something]]]]><![CDATA[></root>]]></dummy>

Looks good to me. According to the XML spec (both 1.0 and 1.1), "CDATA sections cannot nest":

https://www.w3.org/TR/REC-xml/#sec-cdata-sect

But splitting the CDATA section makes perfect sense. This does not even need an option, we can just remove the check and add a test for it.

Do you want to propose a PR?

The Python "xml.etree.ElementTree" package can also parse this correctly, but escapes this on output since it doesn't support CDATA sections directly. Thus, it seems best to add the test in "test_etree.py" rather than "test_elementtree.py" since the behaviour of both differ here.

Stefan

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to