[ 
https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-179 started by John Bodley.
-------------------------------------------
> DbApiHook string serialization fails when string contains non-ASCII characters
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-179
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-179
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>            Reporter: John Bodley
>            Assignee: John Bodley
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to 
> strings using the ASCII codec,  this is problematic if the cell contains 
> non-ASCII characters, i.e.
> >>> from airflow.hooks import DbApiHook
> >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", 
> line 196, in _serialize_cell
>     return "'" + str(cell).replace("'", "''") + "'"
>   File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 
> 102, in __new__
>     return super(newstr, cls).__new__(cls, value)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: 
> ordinal not in range(128)
> Rather than manually trying to serialize values to an ASCII string one should 
> try to serialize the value to string using the character set of the 
> corresponding target database leveraging the connection to mutate an object 
> to the SQL string literal.
> Note an exception should still be thrown if the target encoding is not 
> compatible with the source encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to