Hi Mikhail,

Thanks for your feedback! Your email touches many topics; can we try
to extract the most important ones and identify what really needs fixing
before the 1.5 release?

> 1) @python_2_unicode_compatible doesn't handle __repr__.

Indeed, this decorator isn't designed to handle __repr__ — I haven't
even tried.

At first sight, I'd say this isn't a sufficient reason to delay the 1.5 release,
but it's a feature we should consider for 1.6.

>    For example, this affects django.db.models.options.Options,
>    django.core.files.base.File (and ContentFile),
>    django.contrib.admin.models.LogEntry, django.template.base.Variable
>    and probably many others (their __repr__ incorrectly returns unicode).

Is it a regression in 1.5? Or did the problem already exist?


> 2) under Python 2.x __str__ is implemented as __unicode__
>    encoded to utf8.

Yes, this is a legacy behavior that I strongly disagree with. It
makes __str__ / __unicode__ handling for Model subclasses
exceedingly sketchy.* The only reason why I didn't remove
it is backwards compatibility.

* I expect future maintainers, including myself, to hate this:
https://github.com/django/django/commit/2ea80b94

> This breaks 'print django_obj' when sys.stdout.encoding
>    is not utf8 because print uses __str__ (not __unicode__) for custom 
> objects,
>    and the terminal expects the result to be encoded in sys.stdout.encoding
>    (print encodes unicode strings to sys.stdout.encoding, but doesn't
>    use __unicode__ of objects; this is hard-coded in Python 2.x). 
>    This may affect REPL in Windows consoles and printing/writing to stdout 
>    in management commands.

Django often defaults to utf-8 rather than try to figure out the
most appropriate charset. Such code generally dates back
to the unicode branch. So, yeah, this problem exists but it's
not really new.


> 3) @python_2_unicode_compatible produces incorrect results 
>    when applied twice (__str__ is patched by previous decorator application 
>    and returns bytestring because of that).
>    This is easy to oversight e.g. when applying this decorator to a
>    subclass of a class which is wrapped to @python_2_unicode_compatible
>    and deleting the overridden __str__ afterwards.

This is a good point. I looked at the `_was_fixed` technique you're
proposing below. It does work in more cases, but I'm pretty sure
you can still break it by overriding __str__ or __unicode__ in a
sufficiently complex class hierarchy.


> 4) __str__ is not always properly implemented for this decorator in django
>    code. To work properly with @python_2_unicode_compatible,
>    __str__ must return unicode string. This is quite subtle.

This is destabilizing for Python 2 developers, but once you get it,
you get the whole "write Python 3 code that also works under
Python 2" philosophy, and it's documented. I stand by this decision.

>    For example, take a look at django.contrib.gis.maps.google.GEvent.
>    __str__ is implemented as 
>    "return mark_safe('"%s", %s' %(self.event, self.action))",
>    but "from __future__ import unicode_literals" is not applied to the file.

django.contrib.gis is a large piece of code that hasn't received as
much attention as core. There's certainly a bunch of bugs like this one.

>    This means that if event and action are Python objects with both __str__
>    and __unicode__ methods defined (e.g. object of class wrapped with
>    python_2_unicode_compatible) then __str__ would be called for these 
> objects,
>    not __unicode__ (because the format string is a bytestring). Generally,
>    "%s" % something is a good and correct pattern for __str__ implementation
>    (it does the right thing under both Python 2.x and 3.x when
>    unicode_literals future import is there), but it is incorrect under Python
>    2.x if unicode_literals is not imported.

I don't understand. Unless you're using @python_2_unicode_compatible,
__str__ must return a string. `"%s" % foo` works in Python 2 and 3 *unless*
you enable unicode_literals; if you do, the right pattern is `str("%s") % foo`.

> 5) %r is very tricky. If unicode_literals are in effect, or some 
>    arguments for string formatting are unicode,
>    "%r" % obj would trigger bytes decoding using sys.getdefaultencoding() 
> under
>    Python 2.x (unless obj is an unicode string), and if obj.__repr__ returns
>    non-ascii text or obj is a bytestring, exception would be raised
>    (because sys.getdefaultencoding() is usually ascii).
>    This format specifier is used, for example, in a default_error_messages
>    for django.db.models.fields.Field; after switching to unicode_literals
>    this may start raising UnicodeDecodeExceptions for non-ascii choices
>    if they are custom objects (not unicode strings).
>    Another example is 
> django.http.response.HttpResponseBase._convert_to_charset
>    where BadHeaderError exception is raised: after switching to 
> unicode_literals
>    %r format specifier start triggering decoding of "value" using 
> sys.getdefaultencoding()
>    which is incorrect because "value" is a bytestring of 'charset' encoding 
> under
>    Python 2.x. Another example is django.utils.datastructures.SortedDict:
>    its __repr__ uses '%r: %r' % (k, v) for k, v in six.iteritems(self)
>    which may fail if key is an unicode string and a value is a bytestring
>    or an object with __repr__ returning non-ascii text. Another example
>    is django.utils.encoding.DjangoUnicodeDecodeError
>    (it has incorrect __str__ by the way because it returns unicode) -
>    it uses "%r" for self.obj, with unicode string formatter,
>    and this would blow up if __repr__ of obj returns non-ascii text.
>    There are other places where %r is used and they all are fragile.

If I understand correctly, you're saying there's a regression in
modules that define __repr__ methods and where unicode_literals
were enabled?

> I've implemented an another python_2_unicode_compatible decorator (inspired 
> by django's, the idea is cool) for NLTK: 
> https://github.com/nltk/nltk/blob/2and3/nltk/compat.py#L122 which resolves 
> some of issues above (it handles __repr__, limits __str__ and __repr__ to 
> ascii and supports subclassing better). The article (rather lengthy, with 
> some django bashing :) that provides motivation for the decorator used in 
> NLTK: http://kmike.ru/python-with-strings-attached/ (the code in the article 
> is a bit outdated, it is not the code used in NLTK; NLTK version was 
> improved, but I didn't update the article yet).


Obviously your version is more elaborate and handles more cases.
The concept of `_was_fixed` is interesting but as explained above
I'm not sure we can make it fully reliable. I'm also concerned about
the transliteration / 7 bit stuff — it smell like something people are
going to complain about or bikeshed forever.

The real question here is -- how much complexity is a core dev willing
to introduce into Django, and defend in front of tens of thousands of
developers for the next few years? As far as I'm concerned, the answer
is "very little", because I'm not very smart and my days only have so
many hours.


To sum up, I think most of your remarks are valid. But taken all together
they aren't easily actionable. You're mixing existing behaviors that we
must keep for backwards-compatibility (even if they're debatable) and
new behaviors that we still have a chance to fix before 1.5 sets them in
stone. And you're requesting a massive lot of changes.

To move forward, I'd suggest to divide and classify all this into smaller
sets of problems, and describe them precisely:
- regressions introduced by Python 3 support (release blockers),
- limitations of the current Python 3 support,
- str / repr method that don't return the correct type,
- legacy design decisions that appear suboptimal,
- etc.

Thanks,

-- 
Aymeric.



-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to