#5663: markdown 1.6b unicodedecodeerror
-------------------------------------------------------------------+--------
Reporter: Koen Biermans <[EMAIL PROTECTED]> | Owner:
mboersma
Status: reopened |
Milestone:
Component: Contrib apps |
Version: SVN
Resolution: |
Keywords: markdown
Stage: Accepted |
Has_patch: 1
Needs_docs: 0 |
Needs_tests: 0
Needs_better_patch: 0 |
-------------------------------------------------------------------+--------
Comment (by wayla):
Replying to [comment:13 Daniel Pope <[EMAIL PROTECTED]>]:
>
> I can confirm by empirical testing that 1.6b does require unicode
strings as input. Your memory has failed you in this case.
>
Actually I wrote the code that changed unicode support between 1.6b & 1.7.
I think I remember what I did. Yes, 1.6b did have ''some'' support for
unicode, but it did not ''require'' it as you state. It also worked with
(most) bytestrings. The fact that it supported unicode at all was kind of
a fluke. It just so happens that the python re module runs just as well on
unicode strings as it does on byte strings. Therefore, 1.6b added a
{{{__unicode__}}} method to the Markdown class which simply wrapped
{{{__str__}}} ({{{return str(self)}}}). In contrast, 1.7 raises a fatal
error before running if it does not get a unicode string (or a bytestring
only of ascii characters as they're a subset of unicode anyway) and will
''only'' return unicode.
Oh, and the reason your first example works:
{{{
>>> markdown.markdown(i)
u'<p>\u20ac\xa3\xbd\n</p>'
}}}
is because the {{{markdown}}} wrapper function did not call either
{{{Markdown.__str__}}} or {{{Markdown.__unicode__}}}, but did it's own
thing reimplementing most of {{{Markdown.__str__}}}. In other words, by
using the wrapper (shortcut) in 1.6b, you get different behavior than if
you call the class directly. That's buggy and I don't recommend it.
Btw, using your example in 1.6b, you should have done this:
{{{
>>> markdown.markdown(i.encode('utf8'), encoding='utf8')
}}}
But even that was buggy. Shortly after Malcolm submitted a patch (which I
applied), we threw away most of that convert-to-unicode stuff (we kept
just enough only for use from the command line - including Malcolm's
patch) and forced the requirement that Markdown only accept unicode.
That's the difference here. 1.7 is the only version where we can be
absolutely sure that it is safe to pass unicode text to markdown. It may
or may not work in ''any'' earlier version. As a Markdown core dev, I will
not guarantee that it will work all the time in anything but 1.7. Sure it
may work fine for you in testing, but then some user will submit some text
that fails and breaks your site. Debian/Ubuntu need to do the right thing
here and provide 1.7 which passes a version test for 1.7.
In any event, it's up to the django core devs to decide what to do in
Django.
--
Ticket URL: <http://code.djangoproject.com/ticket/5663#comment:14>
Django Code <http://code.djangoproject.com/>
The web framework for perfectionists with deadlines
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---