I may be joining the translation discussion shortly; I have a site that is
using Russian, French, English and Indonesian.
I'm importing pages from Wordpress (not blog entries - pages) and I get the
dreaded "UnicodeDecodeError: 'ascii' codec can't decode byte..." error in
Mezzanine code that joins up the titles and that gets the
'description_from_content' when I save the RichTextPage object created from
the wordpress page.
USE_I18N is True in settings.
Obviously I don't want to lose the Cyrillic characters and I need to get
these posts imported. I've tried various options and the best one so far
seems to be using kitchen's to_unicode and to_bytes e.g:
from kitchen.text.converters import to_bytes, to_unicode
...
def import_page(self, page, pages):
title = to_unicode(page['post_title'])
self.vprint("BEGIN Importing page '{0}'".format(to_bytes(title)), 1)
mezz_page = self.get_or_create(RichTextPage, title=title)
if page['post_parent'] > 0: # there is a parent
mezz_page.parent = self.get_mezz_page(page['post_parent'],
pages)
mezz_page.created = page['post_modified']
mezz_page.updated = page['post_modified']
mezz_page.content = to_unicode(page['post_content'])
mezz_page.save()
The parent bit is w.i.p. but this works for the content and title - it
retains the cyrillic characters correctly. However it seems unwieldy. Is
this approach a good one or should I be doing something else?
--
You received this message because you are subscribed to the Google Groups
"Mezzanine Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.