potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702


   I think the solution is not complete, as it does not properly include Python 
encoding. And it is currently (potentially) wrong not only for "zipped" case 
but also for the "non-zipped" case. Maybe there is a chance to fix it for both 
cases. It would likely require to change the interface slightly of the 
open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority 
of cases, but there might be different encodings specified as defined by PEP 
263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 
3 but still, there are cases when it can be useful. Moreover, different python 
files can be encoded with different encoding and we seem to use always the same 
encoding (default) as defined by `locale.getpreferredencoding(False)` (see 
https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and 
there is a way in Python 3 to detect the encoding for Python source files. It 
is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP362 to detect the 
file encoding. I think it would not be too complex to use those to reliably 
detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to