Hi,

I tried to get a little bit deeper and started to build lxml myself to play around with this possible issue.

I think I found out that this is an actual bug, but also how to fix it.
I was able to do it in a way that all your tests of "make test" are green/ok as well as the simple test suite linked below in my previous mail which has been written to demonstrate this problem.

Currently I tested under linux (20.04 LTS, python2.7.18/python3.8.10) only, not Windows, not MacOS, maybe someone of you could verify the patch on this platforms?

Patch against LXML Master (v4.7.0a0/tag: lxml-4.7.0-pre - 982f8d5612925010a12a70748a077af846def6be): https://pastebin.com/raw/x0Zmb0Kn

Should I create a bug report for this within your launchpad tracker to get this patch merged (if acceptable) ?
What do you think about the way it has been fixed?
I think the main problems here are the bytestring vs unicode string comparison regarding namespaces/prefixes/uris -- I'm not sure whether there are some more places where it needs to be fixed as well.

Greetings,

Kai

Am 12.11.21 um 15:09 schrieb Kai Hillmann:
> Hi,
>
>
> I’m not sure if I just overlooked something but it seems that etree.tostring using method c14n2 does not work in the same way in Python2 and Python3. In Python2 it works as expected in Python3 it claims about a not declared namespace which is is still there (see stdout information in tests).
>
> I put together a simple TestCase class (https://pastebin.com/raw/fgMjy0Ax) which shows the different behaviour if invoked for the latest released lxml version 4.6.4 using Python2 or Python3:
>
> c:\python27\python.exe -m nose ./py3_test.py
>
> ....
>
> ----------------------------------------------------------------------
>
> Ran 4 tests in 0.014s
>
> OK
>
> c:\python39\python.exe -m nose ./py3_test.py
>
> EEEE
>
> ======================================================================
>
> ERROR: test_python3_problem_bytesio_iterparse (py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 18, in test_python3_problem_bytesio_iterparse
>
>      handle_div_end(event, element)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 13, in handle_div_end
>
>      etree.tostring(element, method="c14n2")
>
>    File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ======================================================================
>
> ERROR: test_python3_problem_bytesio_iterparse_global_ns_registration (py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 34, in test_python3_problem_bytesio_iterparse_global_ns_registration
>
>      handle_div_end(event, element)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 29, in handle_div_end
>
>      etree.tostring(element, method="c14n2")
>
>    File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ======================================================================
>
> ERROR: test_python3_problem_filebased_iterparse (py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 49, in test_python3_problem_filebased_iterparse
>
>      handle_div_end(event, element)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 44, in handle_div_end
>
>      etree.tostring(element, method="c14n2")
>
>    File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ======================================================================
>
> ERROR: test_python3_problem_filebased_parse (py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 62, in test_python3_problem_filebased_parse
>
>      serialize_div_element(div)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 58, in serialize_div_element
>
>      etree.tostring(element, method="c14n2")
>
>    File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ----------------------------------------------------------------------
>
> Ran 4 tests in 0.010s
>
>
>
> FAILED (errors=4)
>
>
> Could you give me some hint whether this is an actual bug or just a wrong usage?
>
> If it is a bug – should I create a new one in your bug tracker or will you add one directly?
>
>
> Best regards,
>
> Kai
> _______________________________________________
> lxml - The Python XML Toolkit mailing list -- lxml@python.org
> To unsubscribe send an email to lxml-le...@python.org
> https://mail.python.org/mailman3/lists/lxml.python.org/
> Member address: k...@kaih.de

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to