[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

GitBox Sat, 30 Jan 2021 07:35:17 -0800


potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702



   I think the solution is not complete, as it does not properly include Python 
encoding. And it is currently (potentially) wrong not only for "zipped" case 
but also for the "non-zipped" case. Maybe there is a chance to fix it for both 
cases. It would likely require to change the interface slightly of the 
open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority 
of cases, but there might be different encodings specified as defined by PEP 
263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 
3 but still, there are cases when it can be useful. Moreover, different python 
files can be encoded with different encoding and we seem to use always the same 
encoding (default) as defined by `locale.getpreferredencoding(False)` (see 
https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and 
there is a way in Python 3 to detect the encoding for Python source files. It 
is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP 263 (if not) to 
detect the file encoding. I think it would not be too complex to use those to 
reliably detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Reply via email to