GitHub user judoole added a comment to the discussion: GCSToGCSOperator
source_objects failing to parse xcom
Sorry for coming in late here @potiuk, with classic operator, does that mean
all operators that are created the original way with constructor, inheritance
from BaseOperator etc?
We use the pattern of dynamic results a lot actually, because it automatically
adds tasks as upstreams. Thus, we have a self maintained way of keeping the
graph and task dependencies up to date, and the graph becomes a better
documentation of the actual dependencies.
Official operators that is lacking templated fields, or is doing constructor
verification on templated fields as str, we rewrite so that they work. For
example the GCSToGCSOperator is overwritten as so;
```python
from airflow.providers.google.cloud.transfers.gcs_to_gcs import
GCSToGCSOperator as BaseGCSToGCSOperator
class GCSToGCSOperator(BaseGCSToGCSOperator):
def __init__(
self,
*,
source_bucket,
source_object=None,
source_objects=None,
destination_bucket=None,
destination_object=None,
delimiter=None,
move_object=False,
replace=True,
gcp_conn_id="google_cloud_default",
last_modified_time=None,
maximum_modified_time=None,
is_older_than=None,
impersonation_chain: str | Sequence[str] | None = None,
source_object_required=False,
exact_match=False,
match_glob: str | None = None,
**kwargs,
):
"""Because the GCSToGCSOperator cannot handle templated source objects,
we need to override the constructor"""
# First init constructor of the BaseOperator
BaseOperator.__init__(self, **kwargs)
# Next set all fields like in the superclass constructor
# Skipping the checks for wildcard in source_object and source_objects
self.source_bucket = source_bucket
self.source_object = source_object
self.source_objects = source_objects
self.destination_bucket = destination_bucket
self.destination_object = destination_object
self.move_object = move_object
self.replace = replace
self.gcp_conn_id = gcp_conn_id
self.last_modified_time = last_modified_time
self.maximum_modified_time = maximum_modified_time
self.is_older_than = is_older_than
self.impersonation_chain = impersonation_chain
self.source_object_required = source_object_required
self.exact_match = exact_match
self.match_glob = match_glob
self.delimiter = delimiter
if self.delimiter:
raise ValueError("Usage of 'delimiter' is deprecated, please use
'match_glob' instead")
```
Is this a bad pattern? (In my mind, perhaps the check for wildcard could be
after templates are rendered, like in `execute`)
GitHub link:
https://github.com/apache/airflow/discussions/42638#discussioncomment-14812851
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]