Hi all, I have a question regarding the processing of individual files:
We collect some flat files from different sources in csv, raw and unstructured formats. These files are stored in a "{process}/YYYY/MM/DD/" hierarchy and we've built a GCSToGCSTransform operator, which runs a download/transform/upload loop on each file in the directory. This works ok, but I get the impression that the DAG is getting a bit messy from that and because it's contained in each dag, I see very little potential for code reusability. We have some suggestions and they mention writing some libraries and callable script files, so that the functionality can be leveraged across multiple dags. I can also imagine that some may be writing docker containers for that and run these containers on the cloud, instructing where to get the files and put the results. So I'm wondering if anyone found effective ways to deal with that and what is considered best practice for airflow? Rgds, Gerard