[jira] [Commented] (AIRFLOW-3921) Logging bytes fails in Python 2

Maximilian Roos (JIRA) Tue, 19 Feb 2019 10:07:10 -0800


    [ 
https://issues.apache.org/jira/browse/AIRFLOW-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772173#comment-16772173
 ]


Maximilian Roos commented on AIRFLOW-3921:
------------------------------------------

That said, it doesn't look like these lines of code have changed since 1.10.0 
(the version we're upgrading from), so I'm not sure the new errors we're seeing 
are caused by airflow.

> Logging bytes fails in Python 2
> -------------------------------
>
>                 Key: AIRFLOW-3921
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3921
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: utils
>    Affects Versions: 1.10.2
>            Reporter: Maximilian Roos
>            Priority: Minor
>
> We just upgraded to 1.10.2. Thanks for the cadence of releases.
>  
> We've hit one small but critical issue though: when we log a Python2 string 
> (i.e. bytes) that contain non-ascii characters, airflow raises an error.
>  
> This is because airflow uses a `\n` character that is unicode encoded here: 
> [https://github.com/apache/airflow/blob/master/airflow/utils/log/logging_mixin.py#L102,]
>  because `from __future__import unicode_literals` is placed here: 
> [https://github.com/apache/airflow/blob/master/airflow/utils/log/logging_mixin.py#L23]
> (I think this is why, and the repro below supports that, but I'm frequently 
> hitting unicode issues, so please correct me if I'm mistaken)
>  
> You can see the issue reproduced:
>  
>  
> {code:java}
> # non-ascii character
> In [16]: print(u"\u00E9")
> é
> # non-ascii encoded into bytes
> In [11]: u"\u00E9aoeu".encode('utf-8')
> Out[11]: '\xc3\xa9aoeu'
> # works fine when compared with `b"\n"`
> In [18]: u"\u00E9aoeu".encode('utf-8').endswith(b"\n")
> Out[18]: False
> # fails when compared with `u"\n"`
> In [15]: '\xc3\xa9aoeu'.endswith(u"\n")
> ---------------------------------------------------------------------------
> UnicodeDecodeError                        Traceback (most recent call last)
> <ipython-input-15-93bd1ca7fa67> in <module>()
> ----> 1 '\xc3\xa9aoeu'.endswith(u"\n")
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
> ordinal not in range(128)
> {code}
>  
> I'm not sure there's any workaround without something as drastic as removing 
> the `from __future__import unicode_literals`, or changing all our logging to 
> emit unicode (which would break lots of other processes in Python 2). Is 
> there any temporary workaround?
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-3921) Logging bytes fails in Python 2

Reply via email to