[mezzanine-users] How to handle unicode posts and titles

Paul Whipp Wed, 30 Apr 2014 02:39:07 -0700

I may be joining the translation discussion shortly; I have a site that is 
using Russian, French, English and Indonesian.


I'm importing pages from Wordpress (not blog entries - pages) and I get the 
dreaded "UnicodeDecodeError: 'ascii' codec can't decode byte..." error in 
Mezzanine code that joins up the titles and that gets the 
'description_from_content' when I save the RichTextPage object created from 
the wordpress page.

USE_I18N is True in settings.

Obviously I don't want to lose the Cyrillic characters and I need to get 
these posts imported. I've tried various options and the best one so far 
seems to be using kitchen's to_unicode and to_bytes e.g:


from kitchen.text.converters import to_bytes, to_unicode
...

    def import_page(self, page, pages):
        title = to_unicode(page['post_title'])
        self.vprint("BEGIN Importing page '{0}'".format(to_bytes(title)), 1)
        mezz_page = self.get_or_create(RichTextPage, title=title)
        if page['post_parent'] > 0:  # there is a parent
            mezz_page.parent = self.get_mezz_page(page['post_parent'], 
pages)
        mezz_page.created = page['post_modified']
        mezz_page.updated = page['post_modified']
        mezz_page.content = to_unicode(page['post_content'])
        mezz_page.save()

The parent bit is w.i.p. but this works for the content and title - it 
retains the cyrillic characters correctly. However it seems unwieldy. Is 
this approach a good one or should I be doing something else?

-- 
You received this message because you are subscribed to the Google Groups 
"Mezzanine Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[mezzanine-users] How to handle unicode posts and titles

Reply via email to