Hello everyone,

TL;DR; I noticed that we are accumulating some dependencies to external
binaries (downloads and Docker images) which make the Apache Airflow
Community a bit vulnerable to external dependencies.  I would love your
comments/opinions on the proposal I made around this.

*More explanation/status:*

While dependence is fine for officially "released" and "managed" by the
owning organizations, I think it is a bit risky to depend on those long
term and I think we should aim to bring all those "vulnerable" dependencies
into community control.

I reviewed all our code (or I think all !) looking for such dependencies
and prepared an "umbrella" issue where I proposed the approach we can take
for all such dependencies.

I could have missed some - so if you find others feel free to comment/add
the new ones.
All the details are captured here:
https://github.com/apache/airflow/issues/9401 - I discussed the
context/motivation/current status and approach we can take for those
dependencies.

A lot of those dependencies just need review and maybe some updates to
latest versions. And I do not think there is a lot to discuss for those.

There is one point, however, that requires more deliberate action and some
decisions I think.

We have some dependencies on Docker images that we are using from various
sources:
1) officially maintained images
2) images released by organizations that released them for their own
purpose, but they are not "officially maintained" by those organizations
3) images released by private individuals

While 1) is perfectly OK, I think for 2) and 3) we should bring the images
to Airflow community management. Here is the list of those images I found
that need to be moved to Airflow:

   - aneeshkj/helm-unittest
   - ashb/apache-rat:0.13-1
   - godatadriven/krb5-kdc-server
   - polinux/stress (?)
   - osixia/openldap:1.2.0
   - astronomerinc/ap-statsd-exporter:0.11.0
   - astronomerinc/ap-pgbouncer:1.8.1
   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1


*Proposal*:

My proposal is to make a folder in our repository on Github (continue with
the mono-repo approach we follow) to keep corresponding Dockerfiles and
scripts that build and release images from there. Now the only question is
where to keep those images. We currently have apache/airflow but I think we
should reserve it for airflow images only and we should keep those images
elsewhere. Unfortunately, we cannot have "sub-images" of any sort in
DockerHub. We are already abusing a bit the "apache/airflow" namespace as
we are keeping both CI and production images there (but that's quite OK as
the images are similar).

My proposal will be to create an* "apache/airflow-ext"* DockerHub
repository and keep the images there. They will also be a little
abused because we will have to name them with tags - for example:

   - apache/airflow-ext:helm-unittest-[version]
   - apache/airflow-ext:apache-rat-[version]

I am also open to other names for the repo and proposals other ways how to
handle that.

I believe there is no issue with Licences for either of those images (Ash,
Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's ones -
can you comment on that ?)  but I believe licensing on all those images are
ok for us to copy with attribution (I will double-check that for other
images).

WDYT?

J.



-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to