[ 
https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304585#comment-15304585
 ] 

ASF subversion and git services commented on AIRFLOW-179:
---------------------------------------------------------

Commit 8f63640584ca2dcd15bcd361d1f9a0d995bad315 in incubator-airflow's branch 
refs/heads/master from [~artwr]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8f63640 ]

Revert "[AIRFLOW-179] DbApiHook string serialization fails when string contains 
non-ASCII characters"

This reverts commit 87b4b8fa19cb660317198d74f6d51fdde0a7e067.

Reverting as the method used in the dbapi hook is actually package
specific to MySQLdb and would break the sqlite and mssql hooks.


> DbApiHook string serialization fails when string contains non-ASCII characters
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-179
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-179
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>            Reporter: John Bodley
>            Assignee: John Bodley
>             Fix For: Airflow 1.8
>
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to 
> strings using the ASCII codec,  this is problematic if the cell contains 
> non-ASCII characters, i.e.
>     >>> from airflow.hooks import DbApiHook
>     >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>       File 
> "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 
> 196, in _serialize_cell
>         return "'" + str(cell).replace("'", "''") + "'"
>       File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", 
> line 102, in __new__
>         return super(newstr, cls).__new__(cls, value)
>     UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: 
> ordinal not in range(128)
> Rather than manually trying to serialize and escape values to an ASCII string 
> one should try to serialize the value to string using the character set of 
> the corresponding target database leveraging the connection to mutate the 
> object to the SQL string literal.
> Additionally the escaping logic for single quotes (') within the 
> _serialize_cell method seems wrong, i.e. 
>     str(cell).replace("'", "''")
> would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'".
> Note an exception should still be thrown if the target encoding is not 
> compatible with the source encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to