[
https://issues.apache.org/jira/browse/AIRFLOW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196430#comment-16196430
]
Andy Hadjigeorgiou commented on AIRFLOW-1663:
---------------------------------------------
It may make more sense to extend Postgres Connection to a 'RedshiftDB'
connection (and save the 'Redshift' keyword for Redshift cluster management, as
opposed to queries). This would maintain style between any boto-based hooks &
operators.
> Redshift Connection, Hook, & Operator for COPY command usability
> ----------------------------------------------------------------
>
> Key: AIRFLOW-1663
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1663
> Project: Apache Airflow
> Issue Type: New Feature
> Components: hooks, operators
> Reporter: Andy Hadjigeorgiou
> Assignee: Andy Hadjigeorgiou
> Priority: Minor
>
> I'm using Redshift as a data warehouse in conjunction with Airflow, and I've
> found that it wasn't immediately apparent that Airflow had the
> hooks/connections to support Redshift. In practice, because Redshift is based
> off of Postgres, a Postgres hook works for basic commands. However, when
> running a COPY command (uniquely built in Redshift to copy data in parallel),
> more work is necessary to include AWS credentials (ideally credentials aren't
> in version control, but in a connection). Redshift's unloading to s3 feature
> would also benefit from a solution where credentials could be stored in a
> connection.
> My proposed solution is to include a Redshift connection, that will allow us
> to include AWS credentials along with Redshift db connection credentials
> (similar to an S3 connection). From here, I'll create an appropriate
> RedshiftHook (probably an extension of PostgresHook), and a RedshiftOperator,
> with means to simplify Redshift sql queries with AWS credentials (& perhaps
> using psycopg2's copy_expert method).
> It's my first time posting here, and I'm looking to contribute meaningfully -
> any feedback regarding this feature would be much appreciated! I read that
> features which involve contributing to new hooks & operators are welcome, and
> features in line with project Roadmap are ideal ("Adding features already
> offered by existing workflow solutions (i.e we need to add expected
> features"). Currently, Airflow only supports Redshift because of it's basis
> on Postgres, but more native support will be in line with the features of
> other workflow solutions, and attract more Redshift users.
> I've already started work on this feature, once I clean it up I'll post it
> here.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)