[ 
https://issues.apache.org/jira/browse/AIRFLOW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186947#comment-16186947
 ] 

ASF subversion and git services commented on AIRFLOW-1560:
----------------------------------------------------------

Commit 71400b9d89f9faa49b03fccede4df4b85ac1475d in incubator-airflow's branch 
refs/heads/v1-9-test from sid.gupta
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=71400b9 ]

[AIRFLOW-1560] Add AWS DynamoDB hook and operator for inserting batch items

Closes #2587 from
sid88in/feature/dynamodb_hook_and_operator

(cherry picked from commit 2f0798fcc9b7d6c0977b3190670d8a2c03818dd5)
Signed-off-by: Bolke de Bruin <[email protected]>


> Add AWS DynamoDB hook for inserting batch items
> -----------------------------------------------
>
>                 Key: AIRFLOW-1560
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1560
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: aws, boto3, hooks
>            Reporter: Siddharth
>            Assignee: Siddharth
>
> The PR addresses airflow integration with AWS Dynamodb.  
> Currently there is no hook to interact with DynamoDb for reading or writing 
> items (single or batch insertions). To get started, we want to push data in 
> DynamoDB using airflow jobs (scheduled daily). Idea is to read aggregates 
> from Hive and push in DynamoDB (write data job will run everyday to make this 
> happen). First we want to create DynamoDB hooks (this PR addressed the same) 
> and then create operator to move data from Hive to DynamoDB (added hive to 
> dynamo transfer operator)
> I noticed that currently airflow has AWS_HOOK (parent hook for connecting to 
> AWS using credentials stored in configs). It has a function to connect to AWS 
> objects using Client API 
> (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#client)
>  which is specific to EMR_HOOK. But in case of inserting data we can use 
> DynamoDB Resource API 
> (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#service-resource)
>  which provides higher level abstractions for inserting data in DynamoDB). 
> One good question to ask can be difference between client and resource and 
> why use one or the other? "Resources are higher-level abstraction than the 
> raw, low-level calls made by service clients. They can't do anything the 
> clients can't do, but in many cases they are nicer to use. The downside is 
> that they don't always support 100% of the features of a service." 
> (http://boto3.readthedocs.io/en/latest/guide/resources.html) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to