CorsettiS opened a new issue, #30347:
URL: https://github.com/apache/airflow/issues/30347

   ### Description
   
   Just like the logs, it would be interesting and very useful to allow the 
_DAGS_FOLDER_ & _PLUGINS_FOLDER_ to refer to a path in any cloud provider. 
Ideally, a connection to the cloud provider should be created only when the 
DAGs are being parsed. A possible implementation I thought about is to create 
temp dirs with the downloaded contents of the _DAGS_FOLDER_ & _PLUGINS_FOLDER_  
buckets, refer to these temp dirs by _DAGS_FOLDER_ & _PLUGINS_FOLDER_ and every 
time that the dags are parsed, the contents from the cloud are downloaded to 
other  dynamically-created temp dir and compared with the ones in use, 
replacing eventual dags or plugins that may have changed.
   
   That is
   1. creates new config vars called **remote_dags_folder_conn_id** & 
**remote_plugins_folder_conn_id** so we can fetch the credentials for the cloud 
providers using similar mechanism that is currently used for remote logs
   2. if _DAGS_FOLDER_ or _PLUGINS_FOLDER_ starts with **s3://** , **gs://** , 
etc then:
   3. A temp dir is created locally
   4. The files from the referred cloud path are downloaded  into the temp dir 
and used as the dag_folder & plugins_folder reference for airflow 
   5. When dags folder is being parsed, the updated version is downloaded to a 
new dynamically created temp dir that is compared with the current temp dir 
used, and in the case of eventual changes the latest one overwrites the 
previous one where it is needed 
   
   ### Use case/motivation
   
   First motivation is that it would make it easier to deal with airflow on 
kubernetes, where both the scheduler and worker need to have access to 
up-to-date dags & plugins folders and as of now it is not straightforward to 
set it up, specially considering that the current best approach involves 
gitSync, which sometimes may not work due to some blocks from the company's 
cluster (which is my case). By having those folders reachable from a cloud 
provider, airflow setup becomes easier as a whole.
   Second motivation is that it is a very elegant way of decoupling airflow 
into infrastructure & components. Many organisations have a unified git repo 
containing airflow dags & plugins plus infra-specific files (Dockerimage, 
Docker compose yaml file, .txt files listing libraries, etc), and it would be 
nice to at least have the possibility of separating that.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to