Actually, I tried applying your patch, and now my code doesn't crash.
I just seem to lose wide characters during mwlib conversion.

   Joseph

On Feb 23, 7:42 pm, Joseph Turian <[email protected]> wrote:
> On Feb 9, 5:34 am, Osipov <[email protected]> wrote:
>
> > > This writes non unicode, which is incorrect in general. I think
> > > Python:
> > >         unicode(self.out.write(cgi.escape(s.encode("utf8"))),'utf-8')
>
> > > must be here.
>
> > Oh, I'm sorry, Python:
> >   self.out.write(unicode(cgi.escape(s.encode("utf8")),'utf8'))
>
> > is correct. And it works correctly with all my examples.
>
> I am trying to use the following code to convert 'wikitext' (which is
> utf-8) to HTML.
>
>     out=StringIO.StringIO()
>     a=uparser.parseString(j.find("title").text, raw=wikitext,
> wikidb=dummydb.DummyDB())
>     w=htmlwriter.HTMLWriter(out, None)
>     w.write(a)
>     html = out.getvalue()
>
> However, I get similar unicode errors as you:
>
> Traceback (most recent call last):
>   File "./extract-descriptions.py", line 83, in <module>
>     html = out.getvalue()
>   File "/usr/lib64/python2.5/StringIO.py", line 270, in getvalue
>     self.buf += ''.join(self.buflist)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 80: ordinal not in range(128)
>
> How do I avoid these errors?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/mwlib?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to