[
https://issues.apache.org/jira/browse/AIRFLOW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186947#comment-16186947
]
ASF subversion and git services commented on AIRFLOW-1560:
----------------------------------------------------------
Commit 71400b9d89f9faa49b03fccede4df4b85ac1475d in incubator-airflow's branch
refs/heads/v1-9-test from sid.gupta
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=71400b9 ]
[AIRFLOW-1560] Add AWS DynamoDB hook and operator for inserting batch items
Closes #2587 from
sid88in/feature/dynamodb_hook_and_operator
(cherry picked from commit 2f0798fcc9b7d6c0977b3190670d8a2c03818dd5)
Signed-off-by: Bolke de Bruin <[email protected]>
> Add AWS DynamoDB hook for inserting batch items
> -----------------------------------------------
>
> Key: AIRFLOW-1560
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1560
> Project: Apache Airflow
> Issue Type: New Feature
> Components: aws, boto3, hooks
> Reporter: Siddharth
> Assignee: Siddharth
>
> The PR addresses airflow integration with AWS Dynamodb.
> Currently there is no hook to interact with DynamoDb for reading or writing
> items (single or batch insertions). To get started, we want to push data in
> DynamoDB using airflow jobs (scheduled daily). Idea is to read aggregates
> from Hive and push in DynamoDB (write data job will run everyday to make this
> happen). First we want to create DynamoDB hooks (this PR addressed the same)
> and then create operator to move data from Hive to DynamoDB (added hive to
> dynamo transfer operator)
> I noticed that currently airflow has AWS_HOOK (parent hook for connecting to
> AWS using credentials stored in configs). It has a function to connect to AWS
> objects using Client API
> (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#client)
> which is specific to EMR_HOOK. But in case of inserting data we can use
> DynamoDB Resource API
> (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#service-resource)
> which provides higher level abstractions for inserting data in DynamoDB).
> One good question to ask can be difference between client and resource and
> why use one or the other? "Resources are higher-level abstraction than the
> raw, low-level calls made by service clients. They can't do anything the
> clients can't do, but in many cases they are nicer to use. The downside is
> that they don't always support 100% of the features of a service."
> (http://boto3.readthedocs.io/en/latest/guide/resources.html)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)