uranusjr commented on PR #34729:
URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748840122
> Not entirely sure about this. To me, for now, both are quite different. A
dataset points to data and a mount provides an interface that allows you to
manipulate file like objects. So not really a reference to a resource imho. But
maybe I am seeing that wrongly. If you have an example how you think that would
work on the user side it would help.
I see what you mean. The two do have some similarities though, say I want to
trigger a(nother) DAG when a file on S3 is modified, I would write something
like this:
```python
inp = fs.mount("file://my-input.csv")
out = fs.mount("s3://my-warehouse/file.csv")
out_ds = Dataset("s3://my-warehouse/file.csv")
@task(outlets=[out_ds])
def upload(source, target):
with fs.open(target "w"): as f:
f.write(source.read())
upload(inp, out)
```
but the fact I need to manually tell Airflow explicitly what is done in the
task seems a bit awkward, and it seems like Airflow should be able to just do
the right thing. But admittedly I haven’t figured out how exactly Airflow
should infer `out` should be triggered (but not `inp`!) so maybe that’s
premature; we can always figure that out later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]