potiuk edited a comment on pull request #13984: URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702
I think the solution is not complete, as it does not properly include Python encoding. And it is currently (potentially) wrong not only for "zipped" case but also for the "non-zipped" case. Maybe there is a chance to fix it for both cases. It would likely require to change the interface slightly of the open_maybe_zippped function. In Python 3 default encoding is utf-8, and I guess it covers vast majority of cases, but there might be different encodings specified as defined by PEP 263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 3 but still, there are cases when it can be useful. Moreover, different python files can be encoded with different encoding and we seem to use always the same encoding (default) as defined by `locale.getpreferredencoding(False)` (see https://docs.python.org/3/library/io.html#io.TextIOWrapper). However, this function is only used to read python sources I believe, and there is a way in Python 3 to detect the encoding for Python source files. It is there in the standard library: There are those two functions that can be used (added in Python 3.2): * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding * https://docs.python.org/3/library/tokenize.html#tokenize.open They both read BOM of a file (if present) or follow PEP 263 (if not) to detect the file encoding. I think it would not be too complex to use those to reliably detect encoding of python files. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
