Package: python-debian Version: 0.1.16 Severity: important -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I was updating the codebase for the debian patch tracker, and have stumbled across what i believe is a regression. Now that python-debian uses unicode internally (since 0.1.15 it seems), if a Sources file contains both utf-8 and latin-1 encoded maintainer names (like the etch Sources file does), then it seems impossible to produce output from the resulting Sources instance. the following code should illustrate it #!/usr/bin/python from debian import deb822 import sys fh = file(sys.argv[1], "r") outf = file(sys.argv[2], "w") slist = deb822.Sources.iter_paragraphs(fh) for ent in slist: print ent['Package'] outf.write(ent.dump().encode('utf-8')) outf.write("\n") ./testit.py /srv/patch-tracker/archive/dists/etch/main/source/Sources /dev/null <snip lots of output> cadaver cadubi Traceback (most recent call last): File "./testit.py", line 12, in <module> outf.write(ent.dump().encode('utf-8')) File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 387, in dump value = self.get_as_string(key) File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 904, in get_as_string return Deb822.get_as_string(self, key) File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 362, in get_as_string return unicode(self[key]) File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 179, in __getitem__ value = value.decode(self.encoding) File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-5: invalid data (maintainer name for cadubi is in latin-1) i've tried a few variants of dump(), str(), unicode() with catching the UnicodeDecode exception and re-encoding in latin-1, but the problem seems to be within the iterator code. - -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 2.6.34-rc5minime-00800-g198000a (SMP w/2 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages python-debian depends on: ii python 2.5.4-9 An interactive high-level object-o ii python-support 1.0.8 automated rebuilding support for P Versions of packages python-debian recommends: ii python-apt 0.7.95 Python interface to libapt-pkg Versions of packages python-debian suggests: ii gpgv 1.4.10-3 GNU privacy guard - signature veri - -- no debconf information -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFMF9J7ynjLPm522B0RAib+AJ0U6R4WSsqd3kdz5gtOMZlkqimHCACfZmAa Tu+6uN9WwvU/AxMqI0SWrrA= =r73W -----END PGP SIGNATURE----- -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org