I think I'd feel more comfortable if we have it all under "community" umbrella.
- For dev images - I think we have a good idea from couchdb. I will make a POC of that and PR shortly. I already created airflowdev account on Dockerhub and make it available to PMCs of Airlfow and connect it to our repo to automate Dev dependencies. - For the runtime (astronomer) images I took a deeper look and I think it makes perfect sense to add them and release by Airflow Community as well: Here is what is in those images: - astronomerinc/ap-statsd-exporter <https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore> - this image is just based on the official Prometheus Statsd exported with added file "/etc/statsd-exporter/mappings.yml". So the maintenance is mainly about keeping the mapping and possibly upgrade to lates released prometheus-statsd occasionally. The first one sounds like a good idea for community work, the second we can easily automate - same way as we do for production images. Seems that this one is updated once every few months, so we can easily do that. astronomerinc/ap-pgbouncer:latest - astronomerinc/ap-pgbouncer <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore> - this is just packaging pgbouncer into an image - this one seems to be updated more frequently in the past but I think now it's the matter of just following up with the releases of pgbouncer and libressl and lbressl-dev <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore> - astronomerinc/ap-pgbouncer-exporter <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore> - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer Prometheus exporter with libressl and libressl-dev library upgraded. Also usually updated every few months. Here I think it would also make sense to bring the source code in to the community for Juraj's image as well. I also think it would make sense (unlike the dev dependencies) to publish all "runtime" devs under the "apache/airflow" repository. That would be a bit awkward, but I think it's the least "effort" we need to maintain and make sure it is officially "blessed" during the release. So the proposal I have (if we use calver versioning similar to backport packages): - apache/airflow:statstd-exporter-2020.6.31 - apache/airflow:pgbouncer-2020.6.31 - apache/airflow:pgbouncer-exporter-2020.6.31 I am happy to bring it all to our repo and setup automation. J. On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <[email protected]> wrote: > Wow Kamil that's an awesome and mature processs for a company to take -- > I wish more companies treated open source deps that way. > > As I mentioned in the original Helm PR (but just in a comment left to a > review), I left a few of the "support" Docker images as astronomerinc > ones as the upstream Docker images are "unmaintained" (that isn't to say > the projects are, just that the images aren't re-published in a timely > fashion to update openssl etc.) > > I am happy to replace the astronomerinc support images with others if we > want to. I am also happy to clarify/make explicit the license situation > that those images are distributed under (Apache 2) if we want to stick > with them and let us (Astronomer) carry the burden of patching and > updating them -- it is after all part of what people pay us for so we'll > be doing it anyway. > > > Besides, we should provide the possibility to replace "Object code" with > > other objects i.e., use of an image from a private third-party registry. > > The images to use come from the helm values, so are easily changable at > helm install/upgrade time: > > > https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92 > > -ash > > On Jun 24 2020, at 9:07 am, Kamil Breguła <[email protected]> > wrote: > > > These files have no information to determine the license. In my opinion, > > these images ("Derivative Works") should be treated as Astronomer's or > > other users' copyrighted files. Please note that Astronomer may > distribute > > the images under a different license, but they need to acknowledge the > use > > of the Foundation or other licensed software. To do otherwise would be > > stealing. > > > > DockerHub is not an Open Source software registry, and we cannot assume > > that every image there is available under a license that allows free use. > > > > **What does this mean for the project?** > > > > This is incompatible with the Apache license because each runtime > > dependencies must also be based on the Apache-compatible license. These > > images are required to run the Helm Chart, so are its dependencies > > Dependencies that are not compatible with the Apache license are a > problem > > for our users and prevent the use of this project. > > > > **How do we deal with this topic in my organization?** > > > > We take the topic of copyright very seriously in my organization. One of > > the steps we take before publishing a derivative work based on an > > Open-Source license is to audit the source code to see if each part is > > under a license that allows us to use it. If we build images or artifacts > > automatically, we take steps that prevent the accidental publication > > of an > > artifact that could contain works that have an incorrect license. > > > > We do this by building the audited internal registry: > > - In the case of Airflow, this is a copy of the source code and the > > necessary PIP libraries stored in the blockchain-based registry > > (append-only registry). Any change in such a registry undergoes a review > > process and must be approved. It is not possible to revert an approved > > change without leaving a trace. > > - In the case of Docker images, this means that each image is built > > automatically, and no one publishes the images to images register > manually > > (docker push). No step can download files from a registry that is not > > auditable. > > > > Such steps allow you to recreate the software development process, > > e.g. in > > the case of a court case. > > > > In our case, it won't be easy to introduce all similar requirements, > > but we > > can try to be compatible with them so that organizations that have the > same > > requirements can meet them. > > > > **What should we do?** > > > > In my opinion, this is similar to using libraries in our application. > > We do > > not perform a publisher assessment for every library we use. We only > verify > > license compliance. > > > > On the other hand, it looks different because it is "Object Code", not > > "Source Code". We do not use source code directly, but we use an object > > prepared by a third party - "Derivative Works". > > > > In my opinion, relying on any Docker image ("Object Code") is OK if they > > meet the following requirements: > > - The Source Code required to create the object should be publicly > > available and should be compatible with the Apache license. > > - We should have s access to Compilation Information. The Compilation > > Information must suffice to ensure that the continued functioning of the > > source code is in no case prevented or interfered with solely because > > modification has been made. > > > > Besides, we should provide the possibility to replace "Object code" with > > other objects i.e., use of an image from a private third-party registry. > > > > Thank Jarek for paying attention to this issue. I didn't think about it > > before, but now I know I couldn't use the Helm Chart in its current > > form in > > any of my work. I am afraid that many members of our community would face > > similar problems if they tried to use it in a production environment. > > > > > > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <[email protected]> > wrote: > > > >> Licensing wise there is no issue from me: The astronomerinc images are > >> just re-packaging of the upstream images to apply security fixes so are > >> licensed under whatever the original image is (MIT or Apache2 usually, > >> else we wouldn't have put them in the helm chart PR) > >> > >> For background, the reason that we at Astronomer created > >> ap-pgbouncer-exporter in the first place is that the upstream package > >> does not patch/rebuild to address security vulnerabilities. By taking > >> this in to airflow-ext it means we as a project become responsible for > >> monitoring and testing that. (And don't be fooled in to thinking the > >> free scanners can detect all vulns here, we've found them to be very of > >> variable, and questionable accuracy.) > >> > >> That is a non-trivial amount of work for an open source project. > >> > >> Has this ever caused us any problems outside of Pip/python dependencies? > >> (I'm not aware of any.) For runtime this maybe makes sense (again, I'm > >> not yet convinced), but for test-only/dev-only deps this seems like a > >> lot of work that we could better spend on working on Airflow. If we pin > >> versions of docker image used then the only real risk is a left-pad > >> scenario of "I'm deleting all my images" which is a minor risk. > >> > >> Do any other project do anything like this? I haven't seen it before. > >> > >> I'd vote for doing nothing and addressing this in specific cases when it > >> becomes a problem. Because I do not see using thidy party docker images > >> as a risk. I see it as a time saving measure. > >> > >> -ash > >> > >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <[email protected]> > wrote: > >> > >> > Hello everyone, > >> > > >> > TL;DR; I noticed that we are accumulating some dependencies to > external > >> > binaries (downloads and Docker images) which make the Apache Airflow > >> > Community a bit vulnerable to external dependencies. I would love > your > >> > comments/opinions on the proposal I made around this. > >> > > >> > *More explanation/status:* > >> > > >> > While dependence is fine for officially "released" and "managed" by > the > >> > owning organizations, I think it is a bit risky to depend on those > long > >> > term and I think we should aim to bring all those "vulnerable" > >> dependencies > >> > into community control. > >> > > >> > I reviewed all our code (or I think all !) looking for such > dependencies > >> > and prepared an "umbrella" issue where I proposed the approach we can > >> take > >> > for all such dependencies. > >> > > >> > I could have missed some - so if you find others feel free to > comment/add > >> > the new ones. > >> > All the details are captured here: > >> > https://github.com/apache/airflow/issues/9401 - I discussed the > >> > context/motivation/current status and approach we can take for those > >> > dependencies. > >> > > >> > A lot of those dependencies just need review and maybe some updates to > >> > latest versions. And I do not think there is a lot to discuss for > those. > >> > > >> > There is one point, however, that requires more deliberate action and > >> some > >> > decisions I think. > >> > > >> > We have some dependencies on Docker images that we are using from > various > >> > sources: > >> > 1) officially maintained images > >> > 2) images released by organizations that released them for their own > >> > purpose, but they are not "officially maintained" by those > organizations > >> > 3) images released by private individuals > >> > > >> > While 1) is perfectly OK, I think for 2) and 3) we should bring the > >> images > >> > to Airflow community management. Here is the list of those images I > found > >> > that need to be moved to Airflow: > >> > > >> > - aneeshkj/helm-unittest > >> > - ashb/apache-rat:0.13-1 > >> > - godatadriven/krb5-kdc-server > >> > - polinux/stress (?) > >> > - osixia/openldap:1.2.0 > >> > - astronomerinc/ap-statsd-exporter:0.11.0 > >> > - astronomerinc/ap-pgbouncer:1.8.1 > >> > - astronomerinc/ap-pgbouncer-exporter:0.5.0-1 > >> > > >> > > >> > *Proposal*: > >> > > >> > My proposal is to make a folder in our repository on Github (continue > >> with > >> > the mono-repo approach we follow) to keep corresponding Dockerfiles > and > >> > scripts that build and release images from there. Now the only > >> > question is > >> > where to keep those images. We currently have apache/airflow but I > >> > think we > >> > should reserve it for airflow images only and we should keep those > images > >> > elsewhere. Unfortunately, we cannot have "sub-images" of any sort in > >> > DockerHub. We are already abusing a bit the "apache/airflow" > >> namespace as > >> > we are keeping both CI and production images there (but that's quite > >> > OK as > >> > the images are similar). > >> > > >> > My proposal will be to create an* "apache/airflow-ext"* DockerHub > >> > repository and keep the images there. They will also be a little > >> > abused because we will have to name them with tags - for example: > >> > > >> > - apache/airflow-ext:helm-unittest-[version] > >> > - apache/airflow-ext:apache-rat-[version] > >> > > >> > I am also open to other names for the repo and proposals other ways > >> > how to > >> > handle that. > >> > > >> > I believe there is no issue with Licences for either of those images > >> (Ash, > >> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's > >> ones - > >> > can you comment on that ?) but I believe licensing on all those > >> > images are > >> > ok for us to copy with attribution (I will double-check that for other > >> > images). > >> > > >> > WDYT? > >> > > >> > J. > >> > > >> > > >> > > >> > -- > >> > > >> > Jarek Potiuk > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer > >> > > >> > M: +48 660 796 129 <+48660796129> > >> > [image: Polidea] <https://www.polidea.com/> > >> > > >> > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
