Hi,
I tried to get a little bit deeper and started to build lxml myself to
play around with this possible issue.
I think I found out that this is an actual bug, but also how to fix it.
I was able to do it in a way that all your tests of "make test" are
green/ok as well as the simple test suite linked below in my previous
mail which has been written to demonstrate this problem.
Currently I tested under linux (20.04 LTS, python2.7.18/python3.8.10)
only, not Windows, not MacOS, maybe someone of you could verify the
patch on this platforms?
Patch against LXML Master (v4.7.0a0/tag: lxml-4.7.0-pre -
982f8d5612925010a12a70748a077af846def6be): https://pastebin.com/raw/x0Zmb0Kn
Should I create a bug report for this within your launchpad tracker to
get this patch merged (if acceptable) ?
What do you think about the way it has been fixed?
I think the main problems here are the bytestring vs unicode string
comparison regarding namespaces/prefixes/uris -- I'm not sure whether
there are some more places where it needs to be fixed as well.
Greetings,
Kai
Am 12.11.21 um 15:09 schrieb Kai Hillmann:
> Hi,
>
>
> I’m not sure if I just overlooked something but it seems that
etree.tostring using method c14n2 does not work in the same way in
Python2 and Python3. In Python2 it works as expected in Python3 it
claims about a not declared namespace which is is still there (see
stdout information in tests).
>
> I put together a simple TestCase class
(https://pastebin.com/raw/fgMjy0Ax) which shows the different behaviour
if invoked for the latest released lxml version 4.6.4 using Python2 or
Python3:
>
> c:\python27\python.exe -m nose ./py3_test.py
>
> ....
>
> ----------------------------------------------------------------------
>
> Ran 4 tests in 0.014s
>
> OK
>
> c:\python39\python.exe -m nose ./py3_test.py
>
> EEEE
>
> ======================================================================
>
> ERROR: test_python3_problem_bytesio_iterparse
(py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 18, in
test_python3_problem_bytesio_iterparse
>
> handle_div_end(event, element)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 13, in
handle_div_end
>
> etree.tostring(element, method="c14n2")
>
> File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in
lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in
lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in
lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in
lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is
not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ======================================================================
>
> ERROR: test_python3_problem_bytesio_iterparse_global_ns_registration
(py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 34, in
test_python3_problem_bytesio_iterparse_global_ns_registration
>
> handle_div_end(event, element)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 29, in
handle_div_end
>
> etree.tostring(element, method="c14n2")
>
> File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in
lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in
lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in
lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in
lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is
not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ======================================================================
>
> ERROR: test_python3_problem_filebased_iterparse
(py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 49, in
test_python3_problem_filebased_iterparse
>
> handle_div_end(event, element)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 44, in
handle_div_end
>
> etree.tostring(element, method="c14n2")
>
> File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in
lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in
lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in
lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in
lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is
not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ======================================================================
>
> ERROR: test_python3_problem_filebased_parse
(py3_test.LXML_C14N2_RegressionTest)
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 62, in
test_python3_problem_filebased_parse
>
> serialize_div_element(div)
>
> File "c:\devel\code\cs.requirements\py3_test.py", line 58, in
serialize_div_element
>
> etree.tostring(element, method="c14n2")
>
> File "src\lxml\etree.pyx", line 3407, in lxml.etree.tostring
>
> File "src\lxml\serializer.pxi", line 943, in
lxml.etree._tree_to_target
>
> File "src\lxml\serializer.pxi", line 1128, in
lxml.etree.C14NWriterTarget.start
>
> File "src\lxml\serializer.pxi", line 1155, in
lxml.etree.C14NWriterTarget._start
>
> File "src\lxml\serializer.pxi", line 1085, in
lxml.etree.C14NWriterTarget._qname
>
> ValueError: Namespace http://www.w3.org/1999/xhtml of name "div" is
not declared in scope
>
> -------------------- >> begin captured stdout << ---------------------
>
> <class 'str'> <class 'str'> some_ns_id = http://www.example.com
>
> <class 'str'> <class 'str'> xhtml = http://www.w3.org/1999/xhtml
>
>
>
> --------------------- >> end captured stdout << ----------------------
>
>
>
> ----------------------------------------------------------------------
>
> Ran 4 tests in 0.010s
>
>
>
> FAILED (errors=4)
>
>
> Could you give me some hint whether this is an actual bug or just a
wrong usage?
>
> If it is a bug – should I create a new one in your bug tracker or
will you add one directly?
>
>
> Best regards,
>
> Kai
> _______________________________________________
> lxml - The Python XML Toolkit mailing list -- lxml@python.org
> To unsubscribe send an email to lxml-le...@python.org
> https://mail.python.org/mailman3/lists/lxml.python.org/
> Member address: k...@kaih.de
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com