[
https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Bodley updated AIRFLOW-179:
--------------------------------
Description:
The DbApiHook.insert_rows(...) method tries to serialize all values to strings
using the ASCII codec, this is problematic if the cell contains non-ASCII
characters, i.e.
>>> from airflow.hooks import DbApiHook
>>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py",
line 196, in _serialize_cell
return "'" + str(cell).replace("'", "''") + "'"
File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line
102, in __new__
return super(newstr, cls).__new__(cls, value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal
not in range(128)
Rather than manually trying to serialize and escape values to an ASCII string
one should try to serialize the value to string using the character set of the
corresponding target database leveraging the connection to mutate the object to
the SQL string literal.
Note an exception should still be thrown if the target encoding is not
compatible with the source encoding.
was:
The DbApiHook.insert_rows(...) method tries to serialize all values to strings
using the ASCII codec, this is problematic if the cell contains non-ASCII
characters, i.e.
>>> from airflow.hooks import DbApiHook
>>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py",
line 196, in _serialize_cell
return "'" + str(cell).replace("'", "''") + "'"
File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line
102, in __new__
return super(newstr, cls).__new__(cls, value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal
not in range(128)
Rather than manually trying to serialize values to an ASCII string one should
try to serialize the value to string using the character set of the
corresponding target database leveraging the connection to mutate an object to
the SQL string literal.
Note an exception should still be thrown if the target encoding is not
compatible with the source encoding.
> DbApiHook string serialization fails when string contains non-ASCII characters
> ------------------------------------------------------------------------------
>
> Key: AIRFLOW-179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-179
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks
> Reporter: John Bodley
> Assignee: John Bodley
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to
> strings using the ASCII codec, this is problematic if the cell contains
> non-ASCII characters, i.e.
> >>> from airflow.hooks import DbApiHook
> >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py",
> line 196, in _serialize_cell
> return "'" + str(cell).replace("'", "''") + "'"
> File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line
> 102, in __new__
> return super(newstr, cls).__new__(cls, value)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4:
> ordinal not in range(128)
> Rather than manually trying to serialize and escape values to an ASCII string
> one should try to serialize the value to string using the character set of
> the corresponding target database leveraging the connection to mutate the
> object to the SQL string literal.
> Note an exception should still be thrown if the target encoding is not
> compatible with the source encoding.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)