[
https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Riccomini updated AIRFLOW-179:
------------------------------------
External issue URL: https://github.com/apache/incubator-airflow/pull/1550
> DbApiHook string serialization fails when string contains non-ASCII characters
> ------------------------------------------------------------------------------
>
> Key: AIRFLOW-179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-179
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks
> Reporter: John Bodley
> Assignee: John Bodley
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to
> strings using the ASCII codec, this is problematic if the cell contains
> non-ASCII characters, i.e.
> >>> from airflow.hooks import DbApiHook
> >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line
> 196, in _serialize_cell
> return "'" + str(cell).replace("'", "''") + "'"
> File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py",
> line 102, in __new__
> return super(newstr, cls).__new__(cls, value)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4:
> ordinal not in range(128)
> Rather than manually trying to serialize and escape values to an ASCII string
> one should try to serialize the value to string using the character set of
> the corresponding target database leveraging the connection to mutate the
> object to the SQL string literal.
> Additionally the escaping logic for single quotes (') within the
> _serialize_cell method seems wrong, i.e.
> str(cell).replace("'", "''")
> would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'".
> Note an exception should still be thrown if the target encoding is not
> compatible with the source encoding.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)