Hello everyone, TL;DR; I noticed that we are accumulating some dependencies to external binaries (downloads and Docker images) which make the Apache Airflow Community a bit vulnerable to external dependencies. I would love your comments/opinions on the proposal I made around this.
*More explanation/status:* While dependence is fine for officially "released" and "managed" by the owning organizations, I think it is a bit risky to depend on those long term and I think we should aim to bring all those "vulnerable" dependencies into community control. I reviewed all our code (or I think all !) looking for such dependencies and prepared an "umbrella" issue where I proposed the approach we can take for all such dependencies. I could have missed some - so if you find others feel free to comment/add the new ones. All the details are captured here: https://github.com/apache/airflow/issues/9401 - I discussed the context/motivation/current status and approach we can take for those dependencies. A lot of those dependencies just need review and maybe some updates to latest versions. And I do not think there is a lot to discuss for those. There is one point, however, that requires more deliberate action and some decisions I think. We have some dependencies on Docker images that we are using from various sources: 1) officially maintained images 2) images released by organizations that released them for their own purpose, but they are not "officially maintained" by those organizations 3) images released by private individuals While 1) is perfectly OK, I think for 2) and 3) we should bring the images to Airflow community management. Here is the list of those images I found that need to be moved to Airflow: - aneeshkj/helm-unittest - ashb/apache-rat:0.13-1 - godatadriven/krb5-kdc-server - polinux/stress (?) - osixia/openldap:1.2.0 - astronomerinc/ap-statsd-exporter:0.11.0 - astronomerinc/ap-pgbouncer:1.8.1 - astronomerinc/ap-pgbouncer-exporter:0.5.0-1 *Proposal*: My proposal is to make a folder in our repository on Github (continue with the mono-repo approach we follow) to keep corresponding Dockerfiles and scripts that build and release images from there. Now the only question is where to keep those images. We currently have apache/airflow but I think we should reserve it for airflow images only and we should keep those images elsewhere. Unfortunately, we cannot have "sub-images" of any sort in DockerHub. We are already abusing a bit the "apache/airflow" namespace as we are keeping both CI and production images there (but that's quite OK as the images are similar). My proposal will be to create an* "apache/airflow-ext"* DockerHub repository and keep the images there. They will also be a little abused because we will have to name them with tags - for example: - apache/airflow-ext:helm-unittest-[version] - apache/airflow-ext:apache-rat-[version] I am also open to other names for the repo and proposals other ways how to handle that. I believe there is no issue with Licences for either of those images (Ash, Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's ones - can you comment on that ?) but I believe licensing on all those images are ok for us to copy with attribution (I will double-check that for other images). WDYT? J. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
