#19398: django.utils._os.safe_join should return a native string
-------------------------------------------+----------------------------
               Reporter:  aaugustin        |          Owner:  nobody
                   Type:  Bug              |         Status:  new
              Component:  Python 2         |        Version:  1.5-beta-1
               Severity:  Release blocker  |       Keywords:
           Triage Stage:  Unreviewed       |      Has patch:  0
    Needs documentation:  0                |    Needs tests:  0
Patch needs improvement:  0                |  Easy pickings:  0
                  UI/UX:  0                |
-------------------------------------------+----------------------------
 By default, filesystem paths are represented with native strings (ie.
 `str` objects) in Python 2 and Python 3.

 {{{
 % python2
 >>> import os
 >>> type(os.listdir('.')[0])
 <type 'str'>
 }}}

 {{{
 % python3
 >>> import os
 >>> type(os.listdir('.')[0])
 <class 'str'>
 }}}

 In other words, they were switch from bytestrings to unicode in Python 3.

 ----

 ''A brief interlude for perfectionists and pedants :)''

 In Python 2, it's possible to use unicode for filesystem paths, when
 `os.path.supports_unicode_filenames = True`, but that's not the default
 mode of operation.

 In Python 3, it's possible to use bytestrings for filesystem paths,
 because not all supported platforms sport unicode-aware filesystems: see
 http://docs.python.org/3/library/os.path:
 > The path parameters can be passed as either strings, or bytes.
 Applications are encouraged to represent file names as (Unicode) character
 strings.

 My initial statement still reflects the intent of Python's developers,
 from which Django shouldn't deviate.

 ----

 The conversion to unicode was introduced 4 years ago in
 8fb1459b5294fb9327b241fffec8576c5aa3fc7e. This commit was fixing an issue
 with the reporting of template loading errors.

 In hindsight, it would have been better to keep `safe_join` similar to
 `os.path.join`, and preprocess the arguments or introduce a `safe_joinu`
 method.

 Excluding tests, `safe_join` is used in four places in Django. Auditing
 these for proper use of bytestrings vs. unicode strings seems doable.

 `safe_join` isn't documented and the name `_os` is a strong hint that it's
 a private API.

 Therefore, I propose:
 - to remove the coercion to unicode — which is incorrect anyway, because
 it doesn't honor `sys.getfilesystemencoding()`, and thus fails on non-
 utf-8 filesystems;
 - to perform the coercion in callers that need it, or remove it altogether
 if possible.

 ----

 This bug is preventing me from fixing #19357, which is a release blocker,
 because the static files finders are using `safe_join`. It is also the
 root cause for #17686.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/19398>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to