[jira] [Commented] (AIRFLOW-3921) Logging bytes fails in Python 2

Ash Berlin-Taylor (JIRA) Fri, 22 Feb 2019 01:35:23 -0800


    [ 
https://issues.apache.org/jira/browse/AIRFLOW-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774936#comment-16774936
 ]


Ash Berlin-Taylor commented on AIRFLOW-3921:
--------------------------------------------

We import the unicode literals in Python to (try) to make it behave the same on 
Python2 and Python3.

Because of this, I think the only fix is for your code to log strings, not 
bytes. I.e. don't log `'\xc3\xa9aoeu'`, but `'\xc3\xa9aoeu'.decode('utf-8')`


> Logging bytes fails in Python 2
> -------------------------------
>
>                 Key: AIRFLOW-3921
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3921
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: utils
>    Affects Versions: 1.10.2
>            Reporter: Maximilian Roos
>            Priority: Minor
>
> We just upgraded to 1.10.2. Thanks for the cadence of releases.
>  
> We've hit one small but critical issue though: when we log a Python2 string 
> (i.e. bytes) that contain non-ascii characters, airflow raises an error.
>  
> This is because airflow uses a `\n` character that is unicode encoded here: 
> [https://github.com/apache/airflow/blob/master/airflow/utils/log/logging_mixin.py#L102,]
>  because `from __future__import unicode_literals` is placed here: 
> [https://github.com/apache/airflow/blob/master/airflow/utils/log/logging_mixin.py#L23]
> (I think this is why, and the repro below supports that, but I'm frequently 
> hitting unicode issues, so please correct me if I'm mistaken)
>  
> You can see the issue reproduced:
>  
>  
> {code:java}
> # non-ascii character
> In [16]: print(u"\u00E9")
> é
> # non-ascii encoded into bytes
> In [11]: u"\u00E9aoeu".encode('utf-8')
> Out[11]: '\xc3\xa9aoeu'
> # works fine when compared with `b"\n"`
> In [18]: u"\u00E9aoeu".encode('utf-8').endswith(b"\n")
> Out[18]: False
> # fails when compared with `u"\n"`
> In [15]: '\xc3\xa9aoeu'.endswith(u"\n")
> ---------------------------------------------------------------------------
> UnicodeDecodeError                        Traceback (most recent call last)
> <ipython-input-15-93bd1ca7fa67> in <module>()
> ----> 1 '\xc3\xa9aoeu'.endswith(u"\n")
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
> ordinal not in range(128)
> {code}
>  
> I'm not sure there's any workaround without something as drastic as removing 
> the `from __future__import unicode_literals`, or changing all our logging to 
> emit unicode (which would break lots of other processes in Python 2). Is 
> there any temporary workaround?
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-3921) Logging bytes fails in Python 2

Reply via email to