Ghoul-SSZ commented on code in PR #41304:
URL: https://github.com/apache/airflow/pull/41304#discussion_r1713935540
##########
airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py:
##########
@@ -148,29 +174,58 @@ def hook(self):
return DynamoDBHook(aws_conn_id=self.source_aws_conn_id)
def execute(self, context: Context) -> None:
- if self.export_time:
+ # There are 2 separate export to point in time configuration:
+ # 1. Full export, which takes the export_time arg.
+ # 2. Incremental export, which takes the incremental_export_... args
+ # Hence export time could not be used as the proper indicator for the
`_export_table_to_point_in_time`
+ # function. This change introduces a new boolean, as the indicator for
whether the operator scans
+ # and export entire data or using the point in time functionality.
+ if self.point_in_time_export:
Review Comment:
I agree that it is a breaking change, but I would argue that it is a
necessary one.
1. The point in time export has 2 separate types: `FULL_EXPORT` and
`INCREMENTAL_EXPORT`. Only `FULL_EXPORT` is using the `export_time` argument,
since it will trigger the full export till that point in time. For
`INCREMENTAL_EXPORT` types the `export_time` is never used, instead, you would
need to define a specific period where you want to export the data. So I think
the current `export_time` is not a good candidate for deciding if the operator
should use point int time export. If I pass all the required params for
incremental export to the `export_table_to_point_in_time` call, it will never
run with `export_time` as the deciding factor.
2. I think `export_table_to_point_in_time_kwargs` would have the same issue,
no? If I pass all the parameter I want for incremental export, it would still
not run it with` export_table_to_point_in_time` since the export_time is
missing.
I like the `export_table_to_point_in_time_kwargs`, but we would still need
to solve the first issue above.
Or am I missing something? 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]