Package: python-lxml Version: 1.0.1-2 Severity: normal
Hello, Here is a transcript of a session using python-lxml, the bug is quite obvious: In [1]:import lxml.etree In [2]:doctree = lxml.etree.parse(file('/tmp/test.xml')) In [3]:for tag in doctree.xpath('/tags/*'): .3.: print tag.attrib['tag'], tag.attrib['count'] .3.: ... lot of output In [4]:toto = file('/tmp/test.xml').read() In [5]:import StringIO In [6]:lxml.etree.parse(StringIO.StringIO(toto)) --------------------------------------------------------------------------- exceptions.AssertionError Traceback (most recent call last) /home/nicoe/projets/gnomolicious/src/<ipython console> /home/nicoe/projets/gnomolicious/src/etree.pyx in etree.parse() /home/nicoe/projets/gnomolicious/src/parser.pxi in etree._parseDocument() /home/nicoe/projets/gnomolicious/src/parser.pxi in etree._parseMemoryDocument() /home/nicoe/projets/gnomolicious/src/apihelpers.pxi in etree._utf8() AssertionError: All strings must be Unicode or ASCII > /home/nicoe/projets/gnomolicious/src/apihelpers.pxi(332)etree._utf8() ipdb> q Obviously there should not be any problem when parsing a file through the StringIO interface if there is no problem parsing the same file through the file interface. Since the same beahavior happens with cStringIO I suppose this bug is related to python-lxml. -- System Information: Debian Release: testing/unstable APT prefers unstable APT policy: (500, 'unstable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.16-2-k7 Locale: LANG=fr_BE.UTF-8, LC_CTYPE=fr_BE.UTF-8 (charmap=UTF-8) Versions of packages python-lxml depends on: ii libc6 2.3.6-15 GNU C Library: Shared libraries ii libxml2 2.6.26.dfsg-1 GNOME XML library ii libxslt1.1 1.1.17-1 XSLT processing library - runtime ii python 2.3.5-10 An interactive high-level object-o ii python-central 0.5.0 register and build utility for Pyt ii zlib1g 1:1.2.3-11 compression library - runtime python-lxml recommends no packages. -- no debconf information
test.xml
Description: application/xml