Actually, I tried applying your patch, and now my code doesn't crash. I just seem to lose wide characters during mwlib conversion.
Joseph On Feb 23, 7:42 pm, Joseph Turian <[email protected]> wrote: > On Feb 9, 5:34 am, Osipov <[email protected]> wrote: > > > > This writes non unicode, which is incorrect in general. I think > > > Python: > > > unicode(self.out.write(cgi.escape(s.encode("utf8"))),'utf-8') > > > > must be here. > > > Oh, I'm sorry, Python: > > self.out.write(unicode(cgi.escape(s.encode("utf8")),'utf8')) > > > is correct. And it works correctly with all my examples. > > I am trying to use the following code to convert 'wikitext' (which is > utf-8) to HTML. > > out=StringIO.StringIO() > a=uparser.parseString(j.find("title").text, raw=wikitext, > wikidb=dummydb.DummyDB()) > w=htmlwriter.HTMLWriter(out, None) > w.write(a) > html = out.getvalue() > > However, I get similar unicode errors as you: > > Traceback (most recent call last): > File "./extract-descriptions.py", line 83, in <module> > html = out.getvalue() > File "/usr/lib64/python2.5/StringIO.py", line 270, in getvalue > self.buf += ''.join(self.buflist) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position > 80: ordinal not in range(128) > > How do I avoid these errors? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/mwlib?hl=en -~----------~----~----~----~------~----~------~--~---
