Am Mittwoch, 10. Januar 2007 23:18 schrieb José Matos:
> On Wednesday 10 January 2007 9:33 pm, Georg Baum wrote:
> > Ah, now I know the problem: If we add string literals to document.body
we
> > need to prefix them with u to get unicode string literals: u'bla'. Now
I
> > know where to search.
>
> That is enough to drive anyone (read me) crazy. :-)
Me too, but I found a workaround:
# Unfortunately we have a mixture of unciode strings and plain strings,
# because we never use u'xxx' for string literals, but 'xxx'.
# Therefore we may have to try two times to normalize the data.
try:
document.body[i] = unicodedata.normalize("NFKD", document.body[i])
except TypeError:
document.body[i] = unicodedata.normalize("NFKD",
unicode(document.body[i], 'utf-8'))
That works, now I have to find the next bug :-(
Georg