o-nikolas commented on code in PR #30501:
URL: https://github.com/apache/airflow/pull/30501#discussion_r1173127824
##########
airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py:
##########
@@ -118,11 +126,40 @@ def __init__(
self.dynamodb_scan_kwargs = dynamodb_scan_kwargs
self.s3_bucket_name = s3_bucket_name
self.s3_key_prefix = s3_key_prefix
+ self.export_time = export_time
+ self.export_format = export_format
- def execute(self, context: Context) -> None:
- hook = DynamoDBHook(aws_conn_id=self.source_aws_conn_id)
- table = hook.get_conn().Table(self.dynamodb_table_name)
+ if self.export_time and self.export_time > datetime.now():
+ raise ValueError("The export_time parameter cannot be a future
time.")
+ @cached_property
+ def hook(self):
+ """Create DynamoDBHook"""
+ return DynamoDBHook(aws_conn_id=self.source_aws_conn_id)
+
+ def execute(self, context: Context) -> None:
+ if self.export_time:
+ self._export_table_to_point_in_time()
+ else:
+ self._export_entire_data()
+
+ def _export_table_to_point_in_time(self):
+ """
+ Export data from start of epoc till `export_time`. Table export will
be a snapshot of the table's
+ state at this point in time.
+ """
+ waiter = self.hook.get_waiter(CUSTOM_WAITER_NAME)
+ waiter.wait(
Review Comment:
Just for my own education: Usually you see a waiter used after an initial
api action. For example an api request to create some resource, and and then
the waiter function will call an api like describe periodically in a loop until
the desired state is returned (CREATED maybe in my example). But here you're
just using the waiter to start the export and to wait until it's completed,
very neat! But I'm curious about idempotency, currently the waiter is
configured to poll every 30s, so if the export takes longer than 30s, a second
request to the api will be sent, is that okay? Are there idempotency or
consistency issues with this? Or have I entirely misunderstood this change?
:laughing:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]