Hi Karl, You're not parsing the context_string as XML or HTML; so lxml will be thinking its just some text that looks horribly like XML but is not XML and therefore needs to be escaped to be included within XML.
The following: import lxml.etree as etree content_text = '<p>line one</p><p>line two</p>' en_note_el = etree.XML(f'<en-note>{content_text}</en-note>') en_note_doctype = '<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">' en_note_str = etree.tostring(en_note_el, encoding='UTF-8', method="xml", xml_declaration=True, pretty_print=False, standalone=False, doctype=en_note_doctype) content_el = etree.Element('content') content_el.text = etree.CDATA(en_note_str) print(etree.tostring(content_el).decode('utf8')) Produces the output: <content><![CDATA[<?xml version='1.0' encoding='UTF-8' standalone='no'?> <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"> <en-note><p>line one</p><p>line two</p></en-note>]]></content> Which would expect is what you're after? Cheers, aid > On 18 Aug 2022, at 15:57, k...@cs.stanford.edu wrote: > > Hello, I need to add some HTML inside XML. The result should look like this: > > <content> > <![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="no"?> > <!DOCTYPE en-note SYSTEM > "http://xml.evernote.com/pub/enml2.dtd"><en-note><p>line one</p><p>line > two</p></en-note>]]> > </content> > > the code i'm using is this: > # read html from file - result is : > content_text = '<p>line one</p><p>line two</p>' > > en_note_el = etree.Element('en-note') > en_note_el.text = content_text > en_note_doctype = '<!DOCTYPE en-note SYSTEM > "http://xml.evernote.com/pub/enml2.dtd">' > en_note_str = etree.tostring(en_note_el, encoding='UTF-8', method="xml", > xml_declaration=True, > pretty_print=False, standalone=False, > doctype=en_note_doctype) > > content_el = etree.SubElement(note_el, 'content') > content_el.text = etree.CDATA(en_note_str) > == > > This works, except the included HTML in the text element of en-note is > escaped. Can you help me figure how to not have it be escaped? The contents > inside the <en-note> tags are supposed to be valid HTML, but without any > <html> or <body> sections, and there isn't really a root element. > _______________________________________________ > lxml - The Python XML Toolkit mailing list -- lxml@python.org > To unsubscribe send an email to lxml-le...@python.org > https://mail.python.org/mailman3/lists/lxml.python.org/ > Member address: a...@logic.org.uk
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com