Sure - we could do that as well if we agree on that. Just to explain - the repository is really a "fork" of the original one with our modifications on top. The only reason it's not an "actual" github fork was that I cannot do a fork in "apache" organisation.
J. On Mon, Jul 6, 2020 at 2:22 PM Ash Berlin-Taylor <[email protected]> wrote: > I've just taken a look at the > https://github.com/apache/airflow-pgbouncer-exporter (I'm assuming the > others are the same) and "woah, wait" was my reaction. > > Having a repo where we include the Dockerfile and build scripts: I'm > okay with that. > > This approach where we have an entire copy of the code and have > essentially forked the the upstream project: not happy verging on a > -1/veto of this approach. > > I.e. I'd prefer this repo was just a Dockerfile that pulls the upstream > project from a published release/git tag/pinned commit sha. > > -ash > > On Jul 6 2020, at 12:46 pm, Jarek Potiuk <[email protected]> wrote: > > > One more comment. I started the discussion in the build devlist of > Apache: > > > https://lists.apache.org/thread.html/rf2af2a95e7687fe94ede23fe9df388f784c8231a5968b109f677cbe8%40%3Cbuilds.apache.org%3E > > - and so far there are no conclusive answers. Iy is something that is not > > regulated clearly by ASF rules it seems, > > > > So seems to me we are free to choose what our approach is (for now): > > > > But I have found this at least: > > > > https://www.apache.org/legal/release-policy.html#what > > > > "The Apache Software Foundation produces open source software. All > releases > > are in the form of the source materials needed to make changes to the > > software being released. In some cases, binary/bytecode packages are also > > produced as a convenience to users that might not have the appropriate > > tools to build a compiled version of the source. In all such cases, the > > binary/bytecode package must have the same version number as the source > > release and may only add binary/bytecode files that are the result of > > compiling that version of the source code release." > > > > I think "the spirit" of that chapter is something that I am referring > > to - > > from the beginning of the thread. > > > > I really think if we give our users a convenient way of using some binary > > packages (i.e. docker images) there should be an easy way to reproduce > > those from sources. I have the feeling that my proposal is simply an > > embodiment of that rule. Glad to hear what other think about it. I am > fully > > aware it is a "gray" area, but I think with a very little cost we can > move > > it to the "white" area. > > > > J. > > > > > > > > On Sun, Jul 5, 2020 at 11:42 AM Jarek Potiuk <[email protected]> > > wrote: > > > >> Hello Everyone, > >> > >> TL;DR: I did some experiments with those images and I have a proposal on > >> how we can handle that. I have a workable proposal. > >> > >> I already created a few repos to see how it can work and I think I > >> have a > >> workable and rather easy to maintain the solution. We can still > >> delete this > >> if we choose another way, of course, I just wanted to make sure all > below > >> is "workable" and I simply implemented a complete, working solution. > It's > >> not as complex, but it's good I was doing it - I found a few things that > >> had to be fixed in Dockerfiles and build scripts provided by upstream > >> repos, I also made sure that we are using the latest patched versions of > >> all the tools. In all cases we can rebuild everything from sources - > >> we do > >> not have to rely on some binary that we trust was build from the sources > >> (other than official images).. > >> > >> Happy to hear any comments, but I propose that if the below looks > >> good to > >> you, we get a lazy consensus and I simply implement and document it. I > >> would also make it a rule for our images that we keep that approach for > >> future images. > >> > >> *More details:* > >> > >> 1) I brought all the images to "apache/airlfow" DockerHub registry: both > >> dev images and the ones used in the chart. I tried to have a > >> separate "airflowdev" user but it turns out to be not really good - it's > >> either one-user account or organization with up to three people for > free. > >> That would be a bit hassle with 2-factor authentication etc. to > >> manage it. > >> I think it's actually quite good to have > >> "apache/airflow:helm-unittest-2020.07.10-v0.2.0-v3.1.2". image. Docker > >> works well in this setup and I think it's rather nice to have all the > >> images in one registry. > >> > >> 2) we have three more repos where I cloned the code for those images > that > >> required "whole" repo and made them standalone - i.e. depending only on > >> official images/binaries released by organizations "owning" the code in > >> questions and the code that is officially released in the official > >> "apt" or > >> "apk" (alpine) repositories). I made some airflow specific modifications > >> there (labels, maintainer, sometimes some configuration changes, build > >> scripts). Those changes are merged as separate commits - we should be > able > >> to bring upstream changes from those repos rather easily if we want. > Those > >> are the repos: > >> > >> * https://github.com/apache/airflow-pgbouncer-exporter > >> * https://github.com/apache/airflow-openldap > >> * https://github.com/apache/airflow-helm-unittest > >> > >> 3) Those images that did not require a whole separate repository, I > >> created scripts/Dockerfile folders in those two PRs: "chart/dockerfiles > >> <https://github.com/apache/airflow/pull/9650>" directory for "helm" > >> images and "scripts/ci/dockerfiles > >> <https://github.com/apache/airflow/pull/9652>" for CI images. > >> > >> 4) All the images are based either on "alpine" or "debian-slim" or > >> "ubuntu-slim" images and they are optimized for size. > >> > >> 5) All the images keep similar naming conventions and have similar build > >> scripts that you can simply run to rebuild the images from scratch > (bumping > >> the versions, bringing upstream changes before as needed). An example > build > >> script is below. It will be very easy to upgrade those images as > >> needed and > >> release them separately or all at the same time. Example naming > convention: > >> > >> *apache/airflow:airflow-pgbouncer-2020.07.10-1.14.0* > >> > >> Legend: > >> > >> * *pgbouncer* image released by airflow > >> * *1.14.0* - version of pgbouncer > >> * *2020.07.10* - calver version of the image (roughly - the time when > the > >> image was released/created by Airflow) > >> > >> > >> 6) All images have a consistent labeling scheme - including commit SHA > >> used to generate the image: > >> > >> > >> > >> > >> > >> > >> * "Labels": { > >> "org.apache.airflow.airflow_pgbouncer.version": "2020.07.10", > >> "org.apache.airflow.commit_sha": > >> "43e6406a84d2589bd54c3c37ceaa0c3ebaa9de26", > >> "org.apache.airflow.component": "pgbouncer", > >> "org.apache.airflow.pgbouncer.version": "1.14.0" }* > >> > >> > >> 7) No regular maintenance is needed for CI images - we can bump them > from > >> time to time on an ad-hoc basis or when we need to increase version. For > >> Helm images I think we should release new versions of those images every > >> time we release Helm chart - we can then rebuild the images using the > >> latest patches of debian/alpine and latest versions of the software > >> we have > >> in them. > >> > >> 8) Example build script > >> > >> #!/usr/bin/env bash > >> # Licensed to the Apache Software Foundation (ASF) under one > >> # ... licence here > >> set -euo pipefail > >> DOCKERHUB_USER=${DOCKERHUB_USER:="apache"} > >> DOCKERHUB_REPO=${DOCKERHUB_REPO:="airflow"} > >> PGBOUNCER_VERSION="1.14.0" > >> AIRFLOW_PGBOUNCER_VERSION="2020.07.10" > >> COMMIT_SHA=$(git rev-parse HEAD) > >> > >> cd "$( dirname "${BASH_SOURCE[0]}" )" || exit 1 > >> > >> > >> > TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:airflow-pgbouncer-${AIRFLOW_PGBOUNCER_VERSION}-${PGBOUNCER_VERSION}" > >> > >> docker build . \ > >> --pull \ > >> --build-arg "PGBOUNCER_VERSION=${PGBOUNCER_VERSION}" \ > >> --build-arg > "AIRFLOW_PGBOUNCER_VERSION=${AIRFLOW_PGBOUNCER_VERSION}"\ > >> --build-arg "COMMIT_SHA=${COMMIT_SHA}" \ > >> --tag "${TAG}" > >> > >> docker push "${TAG}" > >> > >> > >> J. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Jul 2, 2020 at 2:12 PM Jarek Potiuk <[email protected]> > >> wrote: > >> > >>> And the right Greg here :(, > >>> > >>> J. > >>> > >>> > >>> > >>> On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <[email protected] > > > >>> wrote: > >>> > >>>> Hey Ash, Greg, Daniel, > >>>> > >>>> So I understand there is no problem with licenses for those images and > >>>> we can get/use the sources for those? > >>>> > >>>> I would love to add the scripts/Dockerfiles to the sources - to be > able > >>>> to rebuild the images. I have some of those already and would like > >>>> to make > >>>> a PR, but It would be great if we can get the Dockerfile sources. > >>>> I also > >>>> want to ask a few questions about versions of the base images (some > >>>> of the > >>>> base images seem to be quite old and there are newer releases so I > wanted > >>>> to check if there is anything to prevent upgrading them). > >>>> > >>>> J > >>>> > >>>> > >>>> On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk < > [email protected]> > >>>> wrote: > >>>> > >>>>> > >>>>> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> > - apache/airflow:statstd-exporter-2020.6.31 > >>>>>> > - apache/airflow:pgbouncer-2020.6.31 > >>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31 > >>>>> > >>>>> Do we count these as "releases" (i.e. do the PMC need to vote on > them) > >>>>>> or not? > >>>>>> > >>>>> > >>>>> I think we should. I believe we should make it a part of regular > >>>>> release and vote together on "airflow + prod image + helm + dependent > >>>>> images". > >>>>> Then we might release each of those separately if needed - with > >>>>> separate voting/process (possibly we can bundle together several > different > >>>>> things to release). Hence CalVer might make more sense even if we > release > >>>>> them together with 1.10.x or 2.Y (especially that those deps are > pretty > >>>>> much independent from the airflow version used). I think for > >>>>> Airflow + Prod > >>>>> image, it makes perfect sense to keep 1.10.* 2.0.* - but for Helm and > >>>>> dependent images - CalVer seems like a better idea. > >>>>> > >>>>> > >>>>> For these I think including the upstream version is useful too > (either > >>>>>> as well, or instead) -- that way people can look at the right > version > >>>>>> of > >>>>>> the upstream docs when looking at what configuration options > >>>>>> there are. > >>>>>> so `apache/airflow:pgbouncer-1.8.1-1` or > >>>>>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D ) > >>>>>> > >>>>> > >>>>> Agree. BTW. I wondered if anyone notices the date ;). > >>>>> > >>>>> (FYI For pgbouncer-exporter there are three such projects on github, > >>>>>> Juraj's was picked somewhat randomly) > >>>>>> > >>>>>> > I think now it's the matter of just following up with the > >>>>>> > releases of pgbouncer and libressl and libressl-dev > >>>>>> > >>>>>> That's still a fairly big "just". And there ssl libraries aren't the > >>>>>> only sources of security patches needed. Also the act of updating is > >>>>>> the > >>>>>> easy part -- its the notification to know when updates are > >>>>>> needed, and > >>>>>> ensuring that they happen in a timely manner that is the hard > >>>>>> part :) > >>>>>> > >>>>> > >>>>> True. But I think we have some precedent in our CI/Prod images. We > have > >>>>> it currently automated so that they self-maintain ad self-upgrade: > >>>>> https://github.com/apache/airflow/blob/master/CI.rst. The current CI > >>>>> automation is done in the way that we are catching up fairly > >>>>> quickly with > >>>>> the latest python patches - almost without noticing (well there is > >>>>> a few > >>>>> hours period where the builds on CI get slower and people need to > update > >>>>> their Breeze images). But other than that it happens automatically > and > >>>>> without anyone doing any active work there. > >>>>> > >>>>> I can do a very similar approach for all the images (both dev and > >>>>> runtime) and add a notification component to notify if any of the > >>>>> upstreaming deps changes. So it will be - from our side - mostly > deciding > >>>>> if we should release it out-of-the-bands or wait for "regular" > release. > >>>>> > >>>>> J. > >>>>> > >>>>> > >>>>>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <[email protected] > > > >>>>>> wrote: > >>>>>> > >>>>>> > I think I'd feel more comfortable if we have it all under > >>>>>> "community" > >>>>>> > umbrella. > >>>>>> > > >>>>>> > - For dev images - I think we have a good idea from couchdb. I > >>>>>> will make > >>>>>> > a POC of that and PR shortly. I already created airflowdev > account > >>>>>> on > >>>>>> > Dockerhub and make it available to PMCs of Airlfow and > >>>>>> connect it > >>>>>> to our > >>>>>> > repo to automate Dev dependencies. > >>>>>> > - For the runtime (astronomer) images I took a deeper look > >>>>>> and I > >>>>>> think > >>>>>> > it makes perfect sense to add them and release by Airflow > Community > >>>>>> > as well: > >>>>>> > > >>>>>> > Here is what is in those images: > >>>>>> > > >>>>>> > - astronomerinc/ap-statsd-exporter > >>>>>> > < > >>>>>> > https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore > >>>>>> > > >>>>>> > - this image is just based on the official Prometheus Statsd > >>>>>> > exported with > >>>>>> > added file "/etc/statsd-exporter/mappings.yml". So the > maintenance > >>>>>> is > >>>>>> > mainly about keeping the mapping and possibly upgrade to lates > >>>>>> released > >>>>>> > prometheus-statsd occasionally. The first one sounds like a good > >>>>>> > idea for > >>>>>> > community work, the second we can easily automate - same way > >>>>>> as we > >>>>>> > do for > >>>>>> > production images. Seems that this one is updated once every few > >>>>>> > months, so > >>>>>> > we can easily do that. astronomerinc/ap-pgbouncer:latest > >>>>>> > - astronomerinc/ap-pgbouncer > >>>>>> > < > >>>>>> > https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore > >>>>>> > > >>>>>> > - this is just packaging pgbouncer into an image - this one > seems > >>>>>> to be > >>>>>> > updated more frequently in the past but I think now it's the > matter > >>>>>> > of just > >>>>>> > following up with the releases of pgbouncer and libressl and > >>>>>> lbressl-dev > >>>>>> > > >>>>>> > < > >>>>>> > https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore > >>>>>> > > >>>>>> > - astronomerinc/ap-pgbouncer-exporter > >>>>>> > < > >>>>>> > https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore > >>>>>> > > >>>>>> > - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer > >>>>>> Prometheus > >>>>>> > exporter with libressl and libressl-dev library upgraded. Also > >>>>>> usually > >>>>>> > updated every few months. Here I think it would also make > >>>>>> sense to > >>>>>> bring > >>>>>> > the source code in to the community for Juraj's image as well. > >>>>>> > > >>>>>> > I also think it would make sense (unlike the dev dependencies) to > >>>>>> publish > >>>>>> > all "runtime" devs under the "apache/airflow" repository. That > would > >>>>>> > be a > >>>>>> > bit awkward, but I think it's the least "effort" we need to > maintain > >>>>>> and > >>>>>> > make sure it is officially "blessed" during the release. > >>>>>> > > >>>>>> > So the proposal I have (if we use calver versioning similar to > >>>>>> backport > >>>>>> > packages): > >>>>>> > > >>>>>> > - apache/airflow:statstd-exporter-2020.6.31 > >>>>>> > - apache/airflow:pgbouncer-2020.6.31 > >>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31 > >>>>>> > > >>>>>> > I am happy to bring it all to our repo and setup automation. > >>>>>> > > >>>>>> > J. > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor < > [email protected]> > >>>>>> wrote: > >>>>>> > > >>>>>> >> Wow Kamil that's an awesome and mature processs for a company to > >>>>>> take -- > >>>>>> >> I wish more companies treated open source deps that way. > >>>>>> >> > >>>>>> >> As I mentioned in the original Helm PR (but just in a comment > left > >>>>>> to a > >>>>>> >> review), I left a few of the "support" Docker images as > >>>>>> astronomerinc > >>>>>> >> ones as the upstream Docker images are "unmaintained" (that isn't > >>>>>> to say > >>>>>> >> the projects are, just that the images aren't re-published in a > >>>>>> timely > >>>>>> >> fashion to update openssl etc.) > >>>>>> >> > >>>>>> >> I am happy to replace the astronomerinc support images with > others > >>>>>> if we > >>>>>> >> want to. I am also happy to clarify/make explicit the license > >>>>>> situation > >>>>>> >> that those images are distributed under (Apache 2) if we want to > >>>>>> stick > >>>>>> >> with them and let us (Astronomer) carry the burden of patching > and > >>>>>> >> updating them -- it is after all part of what people pay us > >>>>>> for so > >>>>>> we'll > >>>>>> >> be doing it anyway. > >>>>>> >> > >>>>>> >> > Besides, we should provide the possibility to replace "Object > >>>>>> code" with > >>>>>> >> > other objects i.e., use of an image from a private third-party > >>>>>> registry. > >>>>>> >> > >>>>>> >> The images to use come from the helm values, so are easily > >>>>>> changable at > >>>>>> >> helm install/upgrade time: > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> > https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92 > >>>>>> >> > >>>>>> >> -ash > >>>>>> >> > >>>>>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła < > >>>>>> [email protected]> > >>>>>> >> wrote: > >>>>>> >> > >>>>>> >> > These files have no information to determine the license. > >>>>>> In my > >>>>>> opinion, > >>>>>> >> > these images ("Derivative Works") should be treated as > >>>>>> Astronomer's or > >>>>>> >> > other users' copyrighted files. Please note that Astronomer may > >>>>>> >> distribute > >>>>>> >> > the images under a different license, but they need to > >>>>>> acknowledge the > >>>>>> >> use > >>>>>> >> > of the Foundation or other licensed software. To do otherwise > >>>>>> would be > >>>>>> >> > stealing. > >>>>>> >> > > >>>>>> >> > DockerHub is not an Open Source software registry, and we > cannot > >>>>>> assume > >>>>>> >> > that every image there is available under a license that allows > >>>>>> >> free use. > >>>>>> >> > > >>>>>> >> > **What does this mean for the project?** > >>>>>> >> > > >>>>>> >> > This is incompatible with the Apache license because each > runtime > >>>>>> >> > dependencies must also be based on the Apache-compatible > license. > >>>>>> These > >>>>>> >> > images are required to run the Helm Chart, so are its > dependencies > >>>>>> >> > Dependencies that are not compatible with the Apache license > >>>>>> are a > >>>>>> >> problem > >>>>>> >> > for our users and prevent the use of this project. > >>>>>> >> > > >>>>>> >> > **How do we deal with this topic in my organization?** > >>>>>> >> > > >>>>>> >> > We take the topic of copyright very seriously in my > organization. > >>>>>> >> One of > >>>>>> >> > the steps we take before publishing a derivative work based > >>>>>> on an > >>>>>> >> > Open-Source license is to audit the source code to see if each > >>>>>> part is > >>>>>> >> > under a license that allows us to use it. If we build images or > >>>>>> artifacts > >>>>>> >> > automatically, we take steps that prevent the accidental > >>>>>> publication > >>>>>> >> > of an > >>>>>> >> > artifact that could contain works that have an incorrect > license. > >>>>>> >> > > >>>>>> >> > We do this by building the audited internal registry: > >>>>>> >> > - In the case of Airflow, this is a copy of the source code and > >>>>>> the > >>>>>> >> > necessary PIP libraries stored in the blockchain-based registry > >>>>>> >> > (append-only registry). Any change in such a registry > >>>>>> undergoes a > >>>>>> review > >>>>>> >> > process and must be approved. It is not possible to revert an > >>>>>> approved > >>>>>> >> > change without leaving a trace. > >>>>>> >> > - In the case of Docker images, this means that each image is > >>>>>> built > >>>>>> >> > automatically, and no one publishes the images to images > register > >>>>>> >> manually > >>>>>> >> > (docker push). No step can download files from a registry > >>>>>> that is > >>>>>> not > >>>>>> >> > auditable. > >>>>>> >> > > >>>>>> >> > Such steps allow you to recreate the software development > process, > >>>>>> >> > e.g. in > >>>>>> >> > the case of a court case. > >>>>>> >> > > >>>>>> >> > In our case, it won't be easy to introduce all similar > >>>>>> requirements, > >>>>>> >> > but we > >>>>>> >> > can try to be compatible with them so that organizations that > >>>>>> have the > >>>>>> >> same > >>>>>> >> > requirements can meet them. > >>>>>> >> > > >>>>>> >> > **What should we do?** > >>>>>> >> > > >>>>>> >> > In my opinion, this is similar to using libraries in our > >>>>>> application. > >>>>>> >> > We do > >>>>>> >> > not perform a publisher assessment for every library we use. We > >>>>>> only > >>>>>> >> verify > >>>>>> >> > license compliance. > >>>>>> >> > > >>>>>> >> > On the other hand, it looks different because it is "Object > >>>>>> Code", not > >>>>>> >> > "Source Code". We do not use source code directly, but we > >>>>>> use an > >>>>>> object > >>>>>> >> > prepared by a third party - "Derivative Works". > >>>>>> >> > > >>>>>> >> > In my opinion, relying on any Docker image ("Object Code") > >>>>>> is OK > >>>>>> if they > >>>>>> >> > meet the following requirements: > >>>>>> >> > - The Source Code required to create the object should be > publicly > >>>>>> >> > available and should be compatible with the Apache license. > >>>>>> >> > - We should have s access to Compilation Information. The > >>>>>> Compilation > >>>>>> >> > Information must suffice to ensure that the continued > functioning > >>>>>> >> of the > >>>>>> >> > source code is in no case prevented or interfered with solely > >>>>>> because > >>>>>> >> > modification has been made. > >>>>>> >> > > >>>>>> >> > Besides, we should provide the possibility to replace "Object > >>>>>> code" with > >>>>>> >> > other objects i.e., use of an image from a private third-party > >>>>>> registry. > >>>>>> >> > > >>>>>> >> > Thank Jarek for paying attention to this issue. I didn't think > >>>>>> >> about it > >>>>>> >> > before, but now I know I couldn't use the Helm Chart in its > >>>>>> current > >>>>>> >> > form in > >>>>>> >> > any of my work. I am afraid that many members of our community > >>>>>> >> would face > >>>>>> >> > similar problems if they tried to use it in a production > >>>>>> environment. > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor < > [email protected] > >>>>>> > > >>>>>> >> wrote: > >>>>>> >> > > >>>>>> >> >> Licensing wise there is no issue from me: The astronomerinc > >>>>>> images are > >>>>>> >> >> just re-packaging of the upstream images to apply security > fixes > >>>>>> >> so are > >>>>>> >> >> licensed under whatever the original image is (MIT or Apache2 > >>>>>> usually, > >>>>>> >> >> else we wouldn't have put them in the helm chart PR) > >>>>>> >> >> > >>>>>> >> >> For background, the reason that we at Astronomer created > >>>>>> >> >> ap-pgbouncer-exporter in the first place is that the upstream > >>>>>> package > >>>>>> >> >> does not patch/rebuild to address security vulnerabilities. By > >>>>>> taking > >>>>>> >> >> this in to airflow-ext it means we as a project become > >>>>>> responsible for > >>>>>> >> >> monitoring and testing that. (And don't be fooled in to > thinking > >>>>>> the > >>>>>> >> >> free scanners can detect all vulns here, we've found them > >>>>>> to be > >>>>>> >> very of > >>>>>> >> >> variable, and questionable accuracy.) > >>>>>> >> >> > >>>>>> >> >> That is a non-trivial amount of work for an open source > project. > >>>>>> >> >> > >>>>>> >> >> Has this ever caused us any problems outside of Pip/python > >>>>>> dependencies? > >>>>>> >> >> (I'm not aware of any.) For runtime this maybe makes sense > >>>>>> (again, I'm > >>>>>> >> >> not yet convinced), but for test-only/dev-only deps this seems > >>>>>> >> like a > >>>>>> >> >> lot of work that we could better spend on working on > >>>>>> Airflow. If > >>>>>> >> we pin > >>>>>> >> >> versions of docker image used then the only real risk is a > >>>>>> left-pad > >>>>>> >> >> scenario of "I'm deleting all my images" which is a minor > risk. > >>>>>> >> >> > >>>>>> >> >> Do any other project do anything like this? I haven't seen it > >>>>>> before. > >>>>>> >> >> > >>>>>> >> >> I'd vote for doing nothing and addressing this in specific > cases > >>>>>> >> when it > >>>>>> >> >> becomes a problem. Because I do not see using thidy party > docker > >>>>>> images > >>>>>> >> >> as a risk. I see it as a time saving measure. > >>>>>> >> >> > >>>>>> >> >> -ash > >>>>>> >> >> > >>>>>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk < > >>>>>> [email protected]> > >>>>>> >> wrote: > >>>>>> >> >> > >>>>>> >> >> > Hello everyone, > >>>>>> >> >> > > >>>>>> >> >> > TL;DR; I noticed that we are accumulating some > >>>>>> dependencies to > >>>>>> >> external > >>>>>> >> >> > binaries (downloads and Docker images) which make the Apache > >>>>>> Airflow > >>>>>> >> >> > Community a bit vulnerable to external dependencies. I > would > >>>>>> love > >>>>>> >> your > >>>>>> >> >> > comments/opinions on the proposal I made around this. > >>>>>> >> >> > > >>>>>> >> >> > *More explanation/status:* > >>>>>> >> >> > > >>>>>> >> >> > While dependence is fine for officially "released" and > >>>>>> "managed" by > >>>>>> >> the > >>>>>> >> >> > owning organizations, I think it is a bit risky to depend on > >>>>>> those > >>>>>> >> long > >>>>>> >> >> > term and I think we should aim to bring all those > "vulnerable" > >>>>>> >> >> dependencies > >>>>>> >> >> > into community control. > >>>>>> >> >> > > >>>>>> >> >> > I reviewed all our code (or I think all !) looking for such > >>>>>> >> dependencies > >>>>>> >> >> > and prepared an "umbrella" issue where I proposed the > approach > >>>>>> >> we can > >>>>>> >> >> take > >>>>>> >> >> > for all such dependencies. > >>>>>> >> >> > > >>>>>> >> >> > I could have missed some - so if you find others feel > >>>>>> free to > >>>>>> >> comment/add > >>>>>> >> >> > the new ones. > >>>>>> >> >> > All the details are captured here: > >>>>>> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed > >>>>>> the > >>>>>> >> >> > context/motivation/current status and approach we can > >>>>>> take for > >>>>>> those > >>>>>> >> >> > dependencies. > >>>>>> >> >> > > >>>>>> >> >> > A lot of those dependencies just need review and maybe some > >>>>>> >> updates to > >>>>>> >> >> > latest versions. And I do not think there is a lot to > discuss > >>>>>> for > >>>>>> >> those. > >>>>>> >> >> > > >>>>>> >> >> > There is one point, however, that requires more deliberate > >>>>>> >> action and > >>>>>> >> >> some > >>>>>> >> >> > decisions I think. > >>>>>> >> >> > > >>>>>> >> >> > We have some dependencies on Docker images that we are using > >>>>>> from > >>>>>> >> various > >>>>>> >> >> > sources: > >>>>>> >> >> > 1) officially maintained images > >>>>>> >> >> > 2) images released by organizations that released them for > >>>>>> their own > >>>>>> >> >> > purpose, but they are not "officially maintained" by those > >>>>>> >> organizations > >>>>>> >> >> > 3) images released by private individuals > >>>>>> >> >> > > >>>>>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should > >>>>>> bring the > >>>>>> >> >> images > >>>>>> >> >> > to Airflow community management. Here is the list of those > >>>>>> >> images I > >>>>>> >> found > >>>>>> >> >> > that need to be moved to Airflow: > >>>>>> >> >> > > >>>>>> >> >> > - aneeshkj/helm-unittest > >>>>>> >> >> > - ashb/apache-rat:0.13-1 > >>>>>> >> >> > - godatadriven/krb5-kdc-server > >>>>>> >> >> > - polinux/stress (?) > >>>>>> >> >> > - osixia/openldap:1.2.0 > >>>>>> >> >> > - astronomerinc/ap-statsd-exporter:0.11.0 > >>>>>> >> >> > - astronomerinc/ap-pgbouncer:1.8.1 > >>>>>> >> >> > - astronomerinc/ap-pgbouncer-exporter:0.5.0-1 > >>>>>> >> >> > > >>>>>> >> >> > > >>>>>> >> >> > *Proposal*: > >>>>>> >> >> > > >>>>>> >> >> > My proposal is to make a folder in our repository on Github > >>>>>> (continue > >>>>>> >> >> with > >>>>>> >> >> > the mono-repo approach we follow) to keep corresponding > >>>>>> Dockerfiles > >>>>>> >> and > >>>>>> >> >> > scripts that build and release images from there. Now the > only > >>>>>> >> >> > question is > >>>>>> >> >> > where to keep those images. We currently have apache/airflow > >>>>>> but I > >>>>>> >> >> > think we > >>>>>> >> >> > should reserve it for airflow images only and we should keep > >>>>>> those > >>>>>> >> images > >>>>>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any > >>>>>> >> sort in > >>>>>> >> >> > DockerHub. We are already abusing a bit the "apache/airflow" > >>>>>> >> >> namespace as > >>>>>> >> >> > we are keeping both CI and production images there (but > that's > >>>>>> quite > >>>>>> >> >> > OK as > >>>>>> >> >> > the images are similar). > >>>>>> >> >> > > >>>>>> >> >> > My proposal will be to create an* "apache/airflow-ext"* > >>>>>> DockerHub > >>>>>> >> >> > repository and keep the images there. They will also be a > >>>>>> little > >>>>>> >> >> > abused because we will have to name them with tags - for > >>>>>> example: > >>>>>> >> >> > > >>>>>> >> >> > - apache/airflow-ext:helm-unittest-[version] > >>>>>> >> >> > - apache/airflow-ext:apache-rat-[version] > >>>>>> >> >> > > >>>>>> >> >> > I am also open to other names for the repo and proposals > other > >>>>>> ways > >>>>>> >> >> > how to > >>>>>> >> >> > handle that. > >>>>>> >> >> > > >>>>>> >> >> > I believe there is no issue with Licences for either of > those > >>>>>> images > >>>>>> >> >> (Ash, > >>>>>> >> >> > Kaxil, Fokko - some of the images are > >>>>>> Astronomer's/GoDataDriven's > >>>>>> >> >> ones - > >>>>>> >> >> > can you comment on that ?) but I believe licensing on all > >>>>>> those > >>>>>> >> >> > images are > >>>>>> >> >> > ok for us to copy with attribution (I will double-check that > >>>>>> for other > >>>>>> >> >> > images). > >>>>>> >> >> > > >>>>>> >> >> > WDYT? > >>>>>> >> >> > > >>>>>> >> >> > J. > >>>>>> >> >> > > >>>>>> >> >> > > >>>>>> >> >> > > >>>>>> >> >> > -- > >>>>>> >> >> > > >>>>>> >> >> > Jarek Potiuk > >>>>>> >> >> > Polidea <https://www.polidea.com/> | Principal Software > >>>>>> Engineer > >>>>>> >> >> > > >>>>>> >> >> > M: +48 660 796 129 <+48660796129> > >>>>>> >> >> > [image: Polidea] <https://www.polidea.com/> > >>>>>> >> >> > > >>>>>> >> >> > >>>>>> >> > > >>>>>> >> > >>>>>> > > >>>>>> > > >>>>>> > -- > >>>>>> > > >>>>>> > Jarek Potiuk > >>>>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer > >>>>>> > > >>>>>> > M: +48 660 796 129 <+48660796129> > >>>>>> > [image: Polidea] <https://www.polidea.com/> > >>>>>> > > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> Jarek Potiuk > >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer > >>>>> > >>>>> M: +48 660 796 129 <+48660796129> > >>>>> [image: Polidea] <https://www.polidea.com/> > >>>>> > >>>>> > >>>> > >>>> -- > >>>> > >>>> Jarek Potiuk > >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer > >>>> > >>>> M: +48 660 796 129 <+48660796129> > >>>> [image: Polidea] <https://www.polidea.com/> > >>>> > >>>> > >>> > >>> -- > >>> > >>> Jarek Potiuk > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer > >>> > >>> M: +48 660 796 129 <+48660796129> > >>> [image: Polidea] <https://www.polidea.com/> > >>> > >>> > >> > >> -- > >> > >> Jarek Potiuk > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > >> > >> M: +48 660 796 129 <+48660796129> > >> [image: Polidea] <https://www.polidea.com/> > >> > >> > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
