[ 
https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439111#comment-16439111
 ] 

ASF subversion and git services commented on AIRFLOW-2254:
----------------------------------------------------------

Commit a148043107f147ce7d3617308f119be27810ec5a in incubator-airflow's branch 
refs/heads/master from [~sathyaprakashg]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a148043 ]

[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254


> Fix header output on RedshiftToS3Transfer
> -----------------------------------------
>
>                 Key: AIRFLOW-2254
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: aws, redshift
>            Reporter: Kengo Seki
>            Assignee: Sathyaprakash Govindasamy
>            Priority: Major
>             Fix For: 2.0.0
>
>
> The current implementation of RedshiftToS3Transfer is as follows and seems to 
> have referred to [this 
> post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].
> {code}
>         unload_query = """
>                         UNLOAD ('SELECT {0}
>                         UNION ALL
>                         SELECT {1} FROM {2}.{3}
>                         ORDER BY 1 DESC')
>                         TO 's3://{4}/{5}/{3}_'
>                         with
>                         credentials 
> 'aws_access_key_id={6};aws_secret_access_key={7}'
>                         {8};
>                         """.format(column_names, column_castings, 
> self.schema, self.table,
>                                    self.s3_bucket, self.s3_key, 
> credentials.access_key,
>                                    credentials.secret_key, unload_options)
> {code}
> {{ORDER BY 1 DESC}} is intended to output the header first, but as [this 
> post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
>  says, it works only if the first column type is not character (e.g. numeric).
> In addition, this query should be used with PARALLEL OFF option, because 
> without that, many files are output but only the first one has the header 
> line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to