[
https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on AIRFLOW-179 started by John Bodley.
-------------------------------------------
> DbApiHook string serialization fails when string contains non-ASCII characters
> ------------------------------------------------------------------------------
>
> Key: AIRFLOW-179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-179
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks
> Reporter: John Bodley
> Assignee: John Bodley
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to
> strings using the ASCII codec, this is problematic if the cell contains
> non-ASCII characters, i.e.
> >>> from airflow.hooks import DbApiHook
> >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py",
> line 196, in _serialize_cell
> return "'" + str(cell).replace("'", "''") + "'"
> File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line
> 102, in __new__
> return super(newstr, cls).__new__(cls, value)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4:
> ordinal not in range(128)
> Rather than manually trying to serialize values to an ASCII string one should
> try to serialize the value to string using the character set of the
> corresponding target database leveraging the connection to mutate an object
> to the SQL string literal.
> Note an exception should still be thrown if the target encoding is not
> compatible with the source encoding.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)