Re: [PR] WIP: Add Airflow FS [airflow]

via GitHub Thu, 05 Oct 2023 05:53:21 -0700


uranusjr commented on PR #34729:
URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748840122


   > Not entirely sure about this. To me, for now, both are quite different. A 
dataset points to data and a mount provides an interface that allows you to 
manipulate file like objects. So not really a reference to a resource imho. But 
maybe I am seeing that wrongly. If you have an example how you think that would 
work on the user side it would help.
   
   I see what you mean. The two do have some similarities though, say I want to 
trigger a(nother) DAG when a file on S3 is modified, I would write something 
like this:
   
   ```python
   inp = fs.mount("file://my-input.csv")
   out = fs.mount("s3://my-warehouse/file.csv")
   
   out_ds = Dataset("s3://my-warehouse/file.csv")
   
   @task(outlets=[out_ds])
   def upload(source, target):
       with fs.open(target "w"): as f:
           f.write(source.read())
   
   upload(inp, out)
   ```
   
   but the fact I need to manually tell Airflow explicitly what is done in the 
task seems a bit awkward, and it seems like Airflow should be able to just do 
the right thing. But admittedly I haven’t figured out how exactly Airflow 
should infer `out` should be triggered (but not `inp`!) so maybe that’s 
premature; we can always figure that out later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] WIP: Add Airflow FS [airflow]

Reply via email to