[ 
https://issues.apache.org/jira/browse/AIRFLOW-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-1862.
---------------------------------------
       Resolution: Fixed
    Fix Version/s:     (was: Airflow 1.8)
                   2.0.0

> redshift_to_s3_operator fails on BOOLEAN column in source table
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-1862
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1862
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: redshift
>    Affects Versions: Airflow 1.8
>            Reporter: Mark S Weiss
>            Assignee: Mark S Weiss
>            Priority: Major
>              Labels: easyfix
>             Fix For: 2.0.0
>
>
> The {{airflow/operators/redshift_to_s3_operator}} module generates an 
> {{UNLOAD}} query using an SQL fragment for the {{column_castings}} that is 
> generated by this line: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87
> This is a bug, because a {{CAST()}} in Redshift on a column of type 
> {{BOOLEAN}} will raise an error and abort execution of the SQL statement. The 
> error raised is {{ERROR:  cannot cast type boolean to character varying}}.
> This can be trivially verified in Redshift using the following code:
> {noformat}
> my_db=# CREATE TABLE temp (BOOLEAN flag);
> my_db=# INSERT INTO temp (flag) VALUES(false);
> my_db=# SELECT CAST (flag AS text) FROM temp;
> ERROR:  cannot cast type boolean to character varying
> {noformat}
> The solution is to handle the case of {{BOOLEAN}} columns, by:
> * Modifying the {{columns_query}} here 
> https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L75
>  to also select column {{data_type}}
> * Modify the expression here to use the {{data_type}} to generate a valid 
> {{CAST}} here: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87
> I have implemented and locally tested this alternative, which I believe is 
> both minimal in scope and robust in all cases, and will submit a pull request.
> {noformat}
>         columns_query = """SELECT column_name, data_type
>                             FROM information_schema.columns
>                             WHERE table_schema = '{0}'
>                             AND   table_name = '{1}'
>                             ORDER BY ordinal_position
>                         """.format(self.schema, self.table)
>         ....
>         ....
>         column_castings = (', ').join(["CAST({0} AS text) AS 
> {1}".format(columns[i], columns[i])
>                                        if types[i] != 'boolean' else
>                                        "CAST(CAST({0} AS SMALLINT)AS text) AS 
> {1}".format(columns[i], columns[i])
>                                        for i in xrange(len(columns))])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to