uranusjr commented on issue #37810:
URL: https://github.com/apache/airflow/issues/37810#issuecomment-2014286445
I gave this a pretty long thought. I am leaning to implementing the `return
Metadata(...)` syntax mentioned above, but with a little flair to solve the
issue it conflicts with XCom by allowing `yield` as well:
```python
@task(outlets=[Dataset("s3://my/data.json")])
def my_task():
with ObjectStoragePath("s3://my/data.json").open("w") as f:
... # Write to file...
yield Metadata(uri="s3://my/data.json", extra={"extra": "metadata"})
return data # This goes to XCom!
```
The thing I particualrly like about this is that in the future, when XCom
gets its own lineage information and can also take additional metadata, we can
also introduce another special type to allow passing int data and metadata at
the same time:
```python
@task(outlets=[Dataset("s3://my/data.json")])
def my_task():
with ObjectStoragePath("s3://my/data.json").open("w") as f:
... # Write to file...
return Output(data, extra={"extra": "metadata"})
```
This also opens the door for sending multiple things from one single
function if we allow `yield Output(...)`. I can think of future extensions that
the return value does not go to the XCom storage, but whatever is specified in
`outlets` directly (without needing to explicitly write data in the function).
There are a lot of opportunities.
That said, I think implementing the context-based approach is still a good
first step toward all this. Even with the more magical and convenient
return-as-metadata syntax, using a context variable is still explicit and may
be preferred by some. It is also easier to implement, and should be a good way
to start things rolling without getting into a ton of syntax design but focus
on the core feature here. So I’m going to start with that first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]