On Fri, May 22, 2009 at 06:36:39PM +0200, Adeodato Simó wrote: > UDD just has the descriptions from Packages.gz, which supposedly are in > UTF-8. If your destination (a file, terminal, whatever) should be > receiving UTF-8, you can just pass them unmodified, eg.: > > for row in curs.fetchall(): > print "%s: %s (%s)\n%s\n" % (pkg, row[0], row[2], row[1]) > > That works for me.
Yes, this actually works fine. > If, for some reason, you need unicode() and not str() objects, then you > should specify that the string is in UTF-8, otherwise it will default to > ASCII: > > for row in curs.fetchall(): > string = unicode(row[1], 'utf-8') Ahh, that seems to be the solution I wanted. And yes, I need unicode because I'm actually using from genshi import Markup string = Markup(row[1]) and I can confirm that string = Markup(unicode(row[1], 'utf-8')) works. > So, your test program is not of much help. If you're still stuck, you > should probably say what are you really trying to do, with details. But > I don't think it's going to be a problem in UDD. > > P.S.: If doing `unicode(row[1], 'utf-8')` raises an exception, that > would be because a package contains non-UTF8 in a description. Your > program should be robust against that, and you can do: > ... Yes, I'm doing this from past experiences when parsing Packages files directly - but thanks for the hint anyway. Thank you very much for the help Andreas. [1] http://wiki.debian.org/UltimateDebianDatabase -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org