One drawback of not doing  this is that we are really depending on the
original repo and it can be removed at any point in time and we might have
hard time restoring it (The infamous "leftpad
<https://www.theregister.com/2016/03/23/npm_left_pad_chaos/>" case). It's a
bit different, but IF we want to make sure we can always rebuild the image.
this might not be the case if the original repo is deleted. This might
happen any time and we have zero control over it nor anyone "standing
behind" official maintenance of it. Which I do not like).

If the origin is not official,  but privately owned, having a forked copy
of the sources prevents us in case that "private" someone deletes their
repository, I think we need to carefully weight that risk when making
decision like this. We can do the copy and can add our commit on top (the
licence allows that), So short of taking a bit more space in GitHub (which
we are not paying for) is the whole cost of keeping the fork (and likely
even not that, depending on how Github caches the same commits coming from
pushes).

We do not have to keep porting/merging it The commit only adds the code and
we have our own (small) Dockerfile that bakes the binary in + script to
build the binary). So this is really the matter of rebasing it IF we decide
to migrate. We don't have to. We might simply rebuild it without migrating
to the latest versions (taking the upgraded alpine image as a base). We
have freedom to do what we decide - but also the freedom to do nothing at
all. But we as a community are in control.

Which is what I really like.

J.

On Mon, Jul 6, 2020 at 2:42 PM Ash Berlin-Taylor <[email protected]> wrote:

> Yeah I figured that from looking at the commits -- but I think even if
> it was an proper fork I wouldn't be a fan of this approach: we'd have
> too keep "porting"/merging our changes to update from upstream.
>
> -ash
>
> On Jul 6 2020, at 1:36 pm, Jarek Potiuk <[email protected]> wrote:
>
> > Sure - we could do that as well if we agree on that.
> >
> > Just to explain - the repository is really a "fork" of the original one
> > with our modifications on top. The only reason it's not an "actual"
> github
> > fork was that I cannot do a fork in "apache" organisation.
> >
> > J.
> >
> >
> > On Mon, Jul 6, 2020 at 2:22 PM Ash Berlin-Taylor <[email protected]> wrote:
> >
> >> I've just taken a look at the
> >> https://github.com/apache/airflow-pgbouncer-exporter (I'm assuming the
> >> others are the same) and "woah, wait" was my reaction.
> >>
> >> Having a repo where we include the Dockerfile and build scripts: I'm
> >> okay with that.
> >>
> >> This approach where we have an entire copy of the code and have
> >> essentially forked the the upstream project: not happy verging on a
> >> -1/veto of this approach.
> >>
> >> I.e. I'd prefer this repo was just a Dockerfile that pulls the upstream
> >> project from a published release/git tag/pinned commit sha.
> >>
> >> -ash
> >>
> >> On Jul 6 2020, at 12:46 pm, Jarek Potiuk <[email protected]>
> wrote:
> >>
> >> > One more comment. I started the discussion in the build devlist of
> >> Apache:
> >> >
> >>
> https://lists.apache.org/thread.html/rf2af2a95e7687fe94ede23fe9df388f784c8231a5968b109f677cbe8%40%3Cbuilds.apache.org%3E
> >> > - and so far there are no conclusive answers. Iy is something that
> >> is not
> >> > regulated clearly by ASF rules it seems,
> >> >
> >> > So seems to me we are free to choose what our approach is (for now):
> >> >
> >> > But I have found this at least:
> >> >
> >> > https://www.apache.org/legal/release-policy.html#what
> >> >
> >> > "The Apache Software Foundation produces open source software. All
> >> releases
> >> > are in the form of the source materials needed to make changes to the
> >> > software being released. In some cases, binary/bytecode packages
> >> are also
> >> > produced as a convenience to users that might not have the appropriate
> >> > tools to build a compiled version of the source. In all such cases,
> the
> >> > binary/bytecode package must have the same version number as the
> source
> >> > release and may only add binary/bytecode files that are the result of
> >> > compiling that version of the source code release."
> >> >
> >> > I think "the spirit" of that chapter is something that I am referring
> >> > to -
> >> > from the beginning of the thread.
> >> >
> >> > I really think if we give our users a convenient way of using some
> binary
> >> > packages (i.e. docker images) there should be an easy way to reproduce
> >> > those from sources. I have the feeling that my proposal is simply an
> >> > embodiment of that rule. Glad to hear what other think about it. I am
> >> fully
> >> > aware it is a "gray" area, but I think with a very little cost we can
> >> move
> >> > it to the "white" area.
> >> >
> >> > J.
> >> >
> >> >
> >> >
> >> > On Sun, Jul 5, 2020 at 11:42 AM Jarek Potiuk <
> [email protected]>
> >> > wrote:
> >> >
> >> >> Hello Everyone,
> >> >>
> >> >> TL;DR: I did some experiments with those images and I have a
> >> proposal on
> >> >> how we can handle that. I have a workable proposal.
> >> >>
> >> >> I already created a few repos to see how it can work and I think I
> >> >> have a
> >> >> workable and rather easy to maintain the solution. We can still
> >> >> delete this
> >> >> if we choose another way, of course, I just wanted to make sure all
> >> below
> >> >> is "workable" and I simply implemented a complete, working solution.
> >> It's
> >> >> not as complex, but it's good I was doing it - I found a few
> >> things that
> >> >> had to be fixed in Dockerfiles and build scripts provided by upstream
> >> >> repos, I also made sure that we are using the latest patched
> >> versions of
> >> >> all the tools. In all cases we can rebuild everything from sources -
> >> >> we do
> >> >> not have to rely on some binary that we trust was build from the
> sources
> >> >> (other than official images)..
> >> >>
> >> >> Happy to hear any comments, but I propose that if the below looks
> >> >> good to
> >> >> you, we get a lazy consensus and I simply implement and document
> >> it. I
> >> >> would also make it a rule for our images that we keep that
> >> approach for
> >> >> future images.
> >> >>
> >> >> *More details:*
> >> >>
> >> >> 1) I brought all the images to "apache/airlfow" DockerHub
> >> registry: both
> >> >> dev images and the ones used in the chart. I tried to have a
> >> >> separate "airflowdev" user but it turns out to be not really good
> >> - it's
> >> >> either one-user account or organization with up to three people for
> >> free.
> >> >> That would be a bit hassle with 2-factor authentication etc. to
> >> >> manage it.
> >> >> I think it's actually quite good to have
> >> >> "apache/airflow:helm-unittest-2020.07.10-v0.2.0-v3.1.2". image.
> Docker
> >> >> works well in this setup and I think it's rather nice to have all the
> >> >> images in one registry.
> >> >>
> >> >> 2) we have three more repos where I cloned the code for those images
> >> that
> >> >> required "whole" repo and made them standalone - i.e. depending
> >> only on
> >> >> official images/binaries released by organizations "owning" the
> >> code in
> >> >> questions and the code that is officially released in the official
> >> >> "apt" or
> >> >> "apk" (alpine) repositories). I made some airflow specific
> modifications
> >> >> there (labels, maintainer, sometimes some configuration changes,
> build
> >> >> scripts). Those changes are merged as separate commits - we should be
> >> able
> >> >> to bring upstream changes from those repos rather easily if we want.
> >> Those
> >> >> are the repos:
> >> >>
> >> >> * https://github.com/apache/airflow-pgbouncer-exporter
> >> >> * https://github.com/apache/airflow-openldap
> >> >> * https://github.com/apache/airflow-helm-unittest
> >> >>
> >> >> 3) Those images that did not require a whole separate repository, I
> >> >> created scripts/Dockerfile folders in those two PRs:
> "chart/dockerfiles
> >> >> <https://github.com/apache/airflow/pull/9650>" directory for "helm"
> >> >> images and "scripts/ci/dockerfiles
> >> >> <https://github.com/apache/airflow/pull/9652>" for CI images.
> >> >>
> >> >> 4) All the images are based either on "alpine" or "debian-slim" or
> >> >> "ubuntu-slim" images and they are optimized for size.
> >> >>
> >> >> 5) All the images keep similar naming conventions and have similar
> build
> >> >> scripts that you can simply run to rebuild the images from scratch
> >> (bumping
> >> >> the versions, bringing upstream changes before as needed). An example
> >> build
> >> >> script is below. It will be very easy to upgrade those images as
> >> >> needed and
> >> >> release them separately or all at the same time. Example naming
> >> convention:
> >> >>
> >> >> *apache/airflow:airflow-pgbouncer-2020.07.10-1.14.0*
> >> >>
> >> >> Legend:
> >> >>
> >> >> * *pgbouncer* image released by airflow
> >> >> * *1.14.0* - version of pgbouncer
> >> >> * *2020.07.10* - calver version of the image (roughly - the time when
> >> the
> >> >> image was released/created by Airflow)
> >> >>
> >> >>
> >> >> 6) All images have a consistent labeling scheme - including commit
> SHA
> >> >> used to generate the image:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> *            "Labels": {
> >> >> "org.apache.airflow.airflow_pgbouncer.version": "2020.07.10",
> >> >>   "org.apache.airflow.commit_sha":
> >> >> "43e6406a84d2589bd54c3c37ceaa0c3ebaa9de26",
> >> >> "org.apache.airflow.component": "pgbouncer",
> >> >> "org.apache.airflow.pgbouncer.version": "1.14.0"            }*
> >> >>
> >> >>
> >> >> 7) No regular maintenance is needed for CI images - we can bump them
> >> from
> >> >> time to time on an ad-hoc basis or when we need to increase
> >> version. For
> >> >> Helm images I think we should release new versions of those images
> every
> >> >> time we release Helm chart - we can then rebuild the images using the
> >> >> latest patches of debian/alpine and latest versions of the software
> >> >> we have
> >> >> in them.
> >> >>
> >> >> 8) Example build script
> >> >>
> >> >> #!/usr/bin/env bash
> >> >> # Licensed to the Apache Software Foundation (ASF) under one
> >> >> # ... licence here
> >> >> set -euo pipefail
> >> >> DOCKERHUB_USER=${DOCKERHUB_USER:="apache"}
> >> >> DOCKERHUB_REPO=${DOCKERHUB_REPO:="airflow"}
> >> >> PGBOUNCER_VERSION="1.14.0"
> >> >> AIRFLOW_PGBOUNCER_VERSION="2020.07.10"
> >> >> COMMIT_SHA=$(git rev-parse HEAD)
> >> >>
> >> >> cd "$( dirname "${BASH_SOURCE[0]}" )" || exit 1
> >> >>
> >> >>
> >> >>
> >>
> TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:airflow-pgbouncer-${AIRFLOW_PGBOUNCER_VERSION}-${PGBOUNCER_VERSION}"
> >> >>
> >> >> docker build . \
> >> >>     --pull \
> >> >>     --build-arg "PGBOUNCER_VERSION=${PGBOUNCER_VERSION}" \
> >> >>     --build-arg
> >> "AIRFLOW_PGBOUNCER_VERSION=${AIRFLOW_PGBOUNCER_VERSION}"\
> >> >>     --build-arg "COMMIT_SHA=${COMMIT_SHA}" \
> >> >>     --tag "${TAG}"
> >> >>
> >> >> docker push "${TAG}"
> >> >>
> >> >>
> >> >> J.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Jul 2, 2020 at 2:12 PM Jarek Potiuk <
> [email protected]>
> >> >> wrote:
> >> >>
> >> >>> And the right Greg here :(,
> >> >>>
> >> >>> J.
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <
> [email protected]
> >> >
> >> >>> wrote:
> >> >>>
> >> >>>> Hey Ash, Greg, Daniel,
> >> >>>>
> >> >>>> So I understand there is no problem with licenses for those
> >> images and
> >> >>>> we can get/use the sources for those?
> >> >>>>
> >> >>>> I would love to add the scripts/Dockerfiles to the sources - to be
> >> able
> >> >>>> to rebuild the images. I have some of those already and would like
> >> >>>> to make
> >> >>>> a  PR, but It would be great if we can get the Dockerfile sources.
> >> >>>> I also
> >> >>>> want to ask a few questions about versions of the base images (some
> >> >>>> of the
> >> >>>> base images seem to be quite old and there are newer releases so I
> >> wanted
> >> >>>> to check if there is anything to prevent upgrading them).
> >> >>>>
> >> >>>> J
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk <
> >> [email protected]>
> >> >>>> wrote:
> >> >>>>
> >> >>>>>
> >> >>>>> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <
> [email protected]>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> > - apache/airflow:statstd-exporter-2020.6.31
> >> >>>>>> > - apache/airflow:pgbouncer-2020.6.31
> >> >>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31
> >> >>>>>
> >> >>>>> Do we count these as "releases" (i.e. do the PMC need to vote on
> >> them)
> >> >>>>>> or not?
> >> >>>>>>
> >> >>>>>
> >> >>>>> I think we should. I believe we should make it a part of regular
> >> >>>>> release and vote together on "airflow + prod image + helm +
> dependent
> >> >>>>> images".
> >> >>>>> Then we might release each of those separately if needed -  with
> >> >>>>> separate voting/process (possibly we can bundle together several
> >> different
> >> >>>>> things to release). Hence CalVer might make more sense even if we
> >> release
> >> >>>>> them together with 1.10.x or 2.Y (especially that those deps are
> >> pretty
> >> >>>>> much independent from the airflow version used). I think for
> >> >>>>> Airflow + Prod
> >> >>>>> image, it makes perfect sense to keep 1.10.* 2.0.* - but for
> >> Helm and
> >> >>>>> dependent images - CalVer seems like a better idea.
> >> >>>>>
> >> >>>>>
> >> >>>>> For these I think including the upstream version is useful too
> >> (either
> >> >>>>>> as well, or instead) -- that way people can look at the right
> >> version
> >> >>>>>> of
> >> >>>>>> the upstream docs when looking at what configuration options
> >> >>>>>> there are.
> >> >>>>>> so `apache/airflow:pgbouncer-1.8.1-1` or
> >> >>>>>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D )
> >> >>>>>>
> >> >>>>>
> >> >>>>> Agree. BTW. I wondered if anyone notices the date ;).
> >> >>>>>
> >> >>>>> (FYI For pgbouncer-exporter there are three such projects on
> github,
> >> >>>>>> Juraj's was picked somewhat randomly)
> >> >>>>>>
> >> >>>>>> > I think now it's the matter of just following up with the
> >> >>>>>> > releases of pgbouncer and libressl and libressl-dev
> >> >>>>>>
> >> >>>>>> That's still a fairly big "just". And there ssl libraries
> >> aren't the
> >> >>>>>> only sources of security patches needed. Also the act of
> >> updating is
> >> >>>>>> the
> >> >>>>>> easy part -- its the notification to know when updates are
> >> >>>>>> needed, and
> >> >>>>>> ensuring that they happen in a timely manner that is the hard
> >> >>>>>> part :)
> >> >>>>>>
> >> >>>>>
> >> >>>>> True. But I think we have some precedent in our CI/Prod images. We
> >> have
> >> >>>>> it currently automated so that they self-maintain ad self-upgrade:
> >> >>>>> https://github.com/apache/airflow/blob/master/CI.rst. The
> >> current CI
> >> >>>>> automation is done in the way that we are catching up fairly
> >> >>>>> quickly with
> >> >>>>> the latest python patches - almost without noticing (well there is
> >> >>>>> a few
> >> >>>>> hours period where the builds on CI get slower and people need to
> >> update
> >> >>>>> their Breeze images). But other than that it happens automatically
> >> and
> >> >>>>> without anyone doing any active work there.
> >> >>>>>
> >> >>>>> I can do a very similar approach for all the images (both dev and
> >> >>>>> runtime) and add a notification component to notify if any of the
> >> >>>>> upstreaming deps changes. So it will be - from our side - mostly
> >> deciding
> >> >>>>> if we should release it out-of-the-bands or wait for "regular"
> >> release.
> >> >>>>>
> >> >>>>> J.
> >> >>>>>
> >> >>>>>
> >> >>>>>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <
> [email protected]
> >> >
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>> > I think  I'd feel more comfortable if we have it all under
> >> >>>>>> "community"
> >> >>>>>> > umbrella.
> >> >>>>>> >
> >> >>>>>> >   - For dev images - I think we have a good idea from
> >> couchdb. I
> >> >>>>>> will make
> >> >>>>>> >   a POC of that and PR shortly. I already created airflowdev
> >> account
> >> >>>>>> on
> >> >>>>>> >   Dockerhub and make it available to PMCs of Airlfow and
> >> >>>>>> connect it
> >> >>>>>> to our
> >> >>>>>> >   repo to automate Dev dependencies.
> >> >>>>>> >   - For the runtime (astronomer) images I took a deeper look
> >> >>>>>> and I
> >> >>>>>> think
> >> >>>>>> >   it makes perfect sense to add them and release by Airflow
> >> Community
> >> >>>>>> > as well:
> >> >>>>>> >
> >> >>>>>> > Here is what is in those images:
> >> >>>>>> >
> >> >>>>>> >   - astronomerinc/ap-statsd-exporter
> >> >>>>>> >   <
> >> >>>>>>
> >>
> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore
> >> >>>>>> >
> >> >>>>>> >   - this image is just based on the official Prometheus Statsd
> >> >>>>>> > exported with
> >> >>>>>> >   added file "/etc/statsd-exporter/mappings.yml". So the
> >> maintenance
> >> >>>>>> is
> >> >>>>>> >   mainly about keeping the mapping and possibly upgrade to
> lates
> >> >>>>>> released
> >> >>>>>> >   prometheus-statsd occasionally. The first one sounds like
> >> a good
> >> >>>>>> > idea for
> >> >>>>>> >   community work, the second we can easily automate - same way
> >> >>>>>> as we
> >> >>>>>> > do for
> >> >>>>>> >   production images. Seems that this one is updated once
> >> every few
> >> >>>>>> > months, so
> >> >>>>>> >   we can easily do that. astronomerinc/ap-pgbouncer:latest
> >> >>>>>> >   - astronomerinc/ap-pgbouncer
> >> >>>>>> >   <
> >> >>>>>>
> >>
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore
> >> >>>>>> >
> >> >>>>>> >   - this is just packaging pgbouncer into an image - this one
> >> seems
> >> >>>>>> to be
> >> >>>>>> >   updated more frequently in the past but I think now it's the
> >> matter
> >> >>>>>> > of just
> >> >>>>>> >   following up with the releases of pgbouncer and libressl and
> >> >>>>>> lbressl-dev
> >> >>>>>> >
> >> >>>>>> >   <
> >> >>>>>>
> >>
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
> >> >>>>>> >
> >> >>>>>> >   - astronomerinc/ap-pgbouncer-exporter
> >> >>>>>> >   <
> >> >>>>>>
> >>
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
> >> >>>>>> >
> >> >>>>>> >   - this is pgbouncer exporter based on Juraj Bubniak's
> PGBouncer
> >> >>>>>> Prometheus
> >> >>>>>> >   exporter with libressl and libressl-dev library upgraded.
> Also
> >> >>>>>> usually
> >> >>>>>> >   updated every few months. Here I think it would also make
> >> >>>>>> sense to
> >> >>>>>> bring
> >> >>>>>> >   the source code in to the community for Juraj's image as
> well.
> >> >>>>>> >
> >> >>>>>> > I also think it would make sense (unlike the dev
> >> dependencies) to
> >> >>>>>> publish
> >> >>>>>> > all "runtime" devs under the "apache/airflow" repository. That
> >> would
> >> >>>>>> > be a
> >> >>>>>> > bit awkward, but I think it's the least "effort" we need to
> >> maintain
> >> >>>>>> and
> >> >>>>>> > make sure it is officially "blessed" during the release.
> >> >>>>>> >
> >> >>>>>> > So the proposal I have (if we use calver versioning similar to
> >> >>>>>> backport
> >> >>>>>> > packages):
> >> >>>>>> >
> >> >>>>>> >   - apache/airflow:statstd-exporter-2020.6.31
> >> >>>>>> >   - apache/airflow:pgbouncer-2020.6.31
> >> >>>>>> >   - apache/airflow:pgbouncer-exporter-2020.6.31
> >> >>>>>> >
> >> >>>>>> > I am happy to bring it all to our repo and setup automation.
> >> >>>>>> >
> >> >>>>>> > J.
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <
> >> [email protected]>
> >> >>>>>> wrote:
> >> >>>>>> >
> >> >>>>>> >> Wow Kamil that's an awesome and mature processs for a
> >> company to
> >> >>>>>> take --
> >> >>>>>> >> I wish more companies treated open source deps that way.
> >> >>>>>> >>
> >> >>>>>> >> As I mentioned in the original Helm PR (but just in a comment
> >> left
> >> >>>>>> to a
> >> >>>>>> >> review), I left a few of the "support" Docker images as
> >> >>>>>> astronomerinc
> >> >>>>>> >> ones as the upstream Docker images are "unmaintained" (that
> isn't
> >> >>>>>> to say
> >> >>>>>> >> the projects are, just that the images aren't re-published
> >> in a
> >> >>>>>> timely
> >> >>>>>> >> fashion to update openssl etc.)
> >> >>>>>> >>
> >> >>>>>> >> I am happy to replace the astronomerinc support images with
> >> others
> >> >>>>>> if we
> >> >>>>>> >> want to. I am also happy to clarify/make explicit the license
> >> >>>>>> situation
> >> >>>>>> >> that those images are distributed under (Apache 2) if we
> >> want to
> >> >>>>>> stick
> >> >>>>>> >> with them and let us (Astronomer) carry the burden of patching
> >> and
> >> >>>>>> >> updating them -- it is after all part of what people pay us
> >> >>>>>> for so
> >> >>>>>> we'll
> >> >>>>>> >> be doing it anyway.
> >> >>>>>> >>
> >> >>>>>> >> > Besides, we should provide the possibility to replace
> "Object
> >> >>>>>> code" with
> >> >>>>>> >> > other objects i.e., use of an image from a private
> third-party
> >> >>>>>> registry.
> >> >>>>>> >>
> >> >>>>>> >> The images to use come from the helm values, so are easily
> >> >>>>>> changable at
> >> >>>>>> >> helm install/upgrade time:
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>>
> >>
> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
> >> >>>>>> >>
> >> >>>>>> >> -ash
> >> >>>>>> >>
> >> >>>>>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła <
> >> >>>>>> [email protected]>
> >> >>>>>> >> wrote:
> >> >>>>>> >>
> >> >>>>>> >> > These files have no information to determine the license.
> >> >>>>>> In my
> >> >>>>>> opinion,
> >> >>>>>> >> > these images ("Derivative Works") should be treated as
> >> >>>>>> Astronomer's or
> >> >>>>>> >> > other users' copyrighted files. Please note that
> >> Astronomer may
> >> >>>>>> >> distribute
> >> >>>>>> >> > the images under a different license, but they need to
> >> >>>>>> acknowledge the
> >> >>>>>> >> use
> >> >>>>>> >> > of the Foundation or other licensed software. To do
> otherwise
> >> >>>>>> would be
> >> >>>>>> >> > stealing.
> >> >>>>>> >> >
> >> >>>>>> >> > DockerHub is not an Open Source software registry, and we
> >> cannot
> >> >>>>>> assume
> >> >>>>>> >> > that every image there is available under a license that
> allows
> >> >>>>>> >> free use.
> >> >>>>>> >> >
> >> >>>>>> >> > **What does this mean for the project?**
> >> >>>>>> >> >
> >> >>>>>> >> > This is incompatible with the Apache license because each
> >> runtime
> >> >>>>>> >> > dependencies must also be based on the Apache-compatible
> >> license.
> >> >>>>>> These
> >> >>>>>> >> > images are required to run the Helm Chart, so are its
> >> dependencies
> >> >>>>>> >> > Dependencies that are not compatible with the Apache license
> >> >>>>>> are a
> >> >>>>>> >> problem
> >> >>>>>> >> > for our users and prevent the use of this project.
> >> >>>>>> >> >
> >> >>>>>> >> > **How do we deal with this topic in my organization?**
> >> >>>>>> >> >
> >> >>>>>> >> > We take the topic of copyright very seriously in my
> >> organization.
> >> >>>>>> >> One of
> >> >>>>>> >> > the steps we take before publishing a derivative work based
> >> >>>>>> on an
> >> >>>>>> >> > Open-Source license is to audit the source code to see if
> each
> >> >>>>>> part is
> >> >>>>>> >> > under a license that allows us to use it. If we build
> >> images or
> >> >>>>>> artifacts
> >> >>>>>> >> > automatically, we take steps that prevent the accidental
> >> >>>>>> publication
> >> >>>>>> >> > of an
> >> >>>>>> >> > artifact that could contain works that have an incorrect
> >> license.
> >> >>>>>> >> >
> >> >>>>>> >> > We do this by building the audited internal registry:
> >> >>>>>> >> > - In the case of Airflow, this is a copy of the source
> >> code and
> >> >>>>>> the
> >> >>>>>> >> > necessary PIP libraries stored in the blockchain-based
> registry
> >> >>>>>> >> > (append-only registry). Any change in such a registry
> >> >>>>>> undergoes a
> >> >>>>>> review
> >> >>>>>> >> > process and must be approved. It is not possible to
> >> revert an
> >> >>>>>> approved
> >> >>>>>> >> > change without leaving a trace.
> >> >>>>>> >> > - In the case of Docker images, this means that each
> >> image is
> >> >>>>>> built
> >> >>>>>> >> > automatically, and no one publishes the images to images
> >> register
> >> >>>>>> >> manually
> >> >>>>>> >> > (docker push). No step can download files from a registry
> >> >>>>>> that is
> >> >>>>>> not
> >> >>>>>> >> > auditable.
> >> >>>>>> >> >
> >> >>>>>> >> > Such steps allow you to recreate the software development
> >> process,
> >> >>>>>> >> > e.g. in
> >> >>>>>> >> > the case of a court case.
> >> >>>>>> >> >
> >> >>>>>> >> > In our case, it won't be easy to introduce all similar
> >> >>>>>> requirements,
> >> >>>>>> >> > but we
> >> >>>>>> >> > can try to be compatible with them so that organizations
> that
> >> >>>>>> have the
> >> >>>>>> >> same
> >> >>>>>> >> > requirements can meet them.
> >> >>>>>> >> >
> >> >>>>>> >> > **What should we do?**
> >> >>>>>> >> >
> >> >>>>>> >> > In my opinion, this is similar to using libraries in our
> >> >>>>>> application.
> >> >>>>>> >> > We do
> >> >>>>>> >> > not perform a publisher assessment for every library we
> >> use. We
> >> >>>>>> only
> >> >>>>>> >> verify
> >> >>>>>> >> > license compliance.
> >> >>>>>> >> >
> >> >>>>>> >> > On the other hand, it looks different because it is "Object
> >> >>>>>> Code", not
> >> >>>>>> >> > "Source Code". We do not use source code directly, but we
> >> >>>>>> use an
> >> >>>>>> object
> >> >>>>>> >> > prepared by a third party - "Derivative Works".
> >> >>>>>> >> >
> >> >>>>>> >> > In my opinion, relying on any Docker image ("Object Code")
> >> >>>>>> is OK
> >> >>>>>> if they
> >> >>>>>> >> > meet the following requirements:
> >> >>>>>> >> > - The Source Code required to create the object should be
> >> publicly
> >> >>>>>> >> > available and should be compatible with the Apache license.
> >> >>>>>> >> > - We should have s access to Compilation Information. The
> >> >>>>>> Compilation
> >> >>>>>> >> > Information must suffice to ensure that the continued
> >> functioning
> >> >>>>>> >> of the
> >> >>>>>> >> > source code is in no case prevented or interfered with
> solely
> >> >>>>>> because
> >> >>>>>> >> > modification has been made.
> >> >>>>>> >> >
> >> >>>>>> >> > Besides, we should provide the possibility to replace
> "Object
> >> >>>>>> code" with
> >> >>>>>> >> > other objects i.e., use of an image from a private
> third-party
> >> >>>>>> registry.
> >> >>>>>> >> >
> >> >>>>>> >> > Thank Jarek for paying attention to this issue.  I didn't
> think
> >> >>>>>> >> about it
> >> >>>>>> >> > before, but now I know I couldn't use the Helm Chart in its
> >> >>>>>> current
> >> >>>>>> >> > form in
> >> >>>>>> >> > any of my work. I am afraid that many members of our
> community
> >> >>>>>> >> would face
> >> >>>>>> >> > similar problems if they tried to use it in a production
> >> >>>>>> environment.
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <
> >> [email protected]
> >> >>>>>> >
> >> >>>>>> >> wrote:
> >> >>>>>> >> >
> >> >>>>>> >> >> Licensing wise there is no issue from me: The astronomerinc
> >> >>>>>> images are
> >> >>>>>> >> >> just re-packaging of the upstream images to apply security
> >> fixes
> >> >>>>>> >> so are
> >> >>>>>> >> >> licensed under whatever the original image is (MIT or
> Apache2
> >> >>>>>> usually,
> >> >>>>>> >> >> else we wouldn't have put them in the helm chart PR)
> >> >>>>>> >> >>
> >> >>>>>> >> >> For background, the reason that we at Astronomer created
> >> >>>>>> >> >> ap-pgbouncer-exporter in the first place is that the
> upstream
> >> >>>>>> package
> >> >>>>>> >> >> does not patch/rebuild to address security
> >> vulnerabilities. By
> >> >>>>>> taking
> >> >>>>>> >> >> this in to airflow-ext it means we as a project become
> >> >>>>>> responsible for
> >> >>>>>> >> >> monitoring and testing that. (And don't be fooled in to
> >> thinking
> >> >>>>>> the
> >> >>>>>> >> >> free scanners can detect all vulns here, we've found them
> >> >>>>>> to be
> >> >>>>>> >> very of
> >> >>>>>> >> >> variable, and questionable accuracy.)
> >> >>>>>> >> >>
> >> >>>>>> >> >> That is a non-trivial amount of work for an open source
> >> project.
> >> >>>>>> >> >>
> >> >>>>>> >> >> Has this ever caused us any problems outside of Pip/python
> >> >>>>>> dependencies?
> >> >>>>>> >> >> (I'm not aware of any.) For runtime this maybe makes sense
> >> >>>>>> (again, I'm
> >> >>>>>> >> >> not yet convinced), but for test-only/dev-only deps this
> seems
> >> >>>>>> >> like a
> >> >>>>>> >> >> lot of work that we could better spend on working on
> >> >>>>>> Airflow. If
> >> >>>>>> >> we pin
> >> >>>>>> >> >> versions of docker image used then the only real risk is a
> >> >>>>>> left-pad
> >> >>>>>> >> >> scenario of "I'm deleting all my images" which is a minor
> >> risk.
> >> >>>>>> >> >>
> >> >>>>>> >> >> Do any other project do anything like this? I haven't
> >> seen it
> >> >>>>>> before.
> >> >>>>>> >> >>
> >> >>>>>> >> >> I'd vote for doing nothing and addressing this in specific
> >> cases
> >> >>>>>> >> when it
> >> >>>>>> >> >> becomes a problem. Because I do not see using thidy party
> >> docker
> >> >>>>>> images
> >> >>>>>> >> >> as a risk. I see it as a time saving measure.
> >> >>>>>> >> >>
> >> >>>>>> >> >> -ash
> >> >>>>>> >> >>
> >> >>>>>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <
> >> >>>>>> [email protected]>
> >> >>>>>> >> wrote:
> >> >>>>>> >> >>
> >> >>>>>> >> >> > Hello everyone,
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > TL;DR; I noticed that we are accumulating some
> >> >>>>>> dependencies to
> >> >>>>>> >> external
> >> >>>>>> >> >> > binaries (downloads and Docker images) which make the
> Apache
> >> >>>>>> Airflow
> >> >>>>>> >> >> > Community a bit vulnerable to external dependencies.  I
> >> would
> >> >>>>>> love
> >> >>>>>> >> your
> >> >>>>>> >> >> > comments/opinions on the proposal I made around this.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > *More explanation/status:*
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > While dependence is fine for officially "released" and
> >> >>>>>> "managed" by
> >> >>>>>> >> the
> >> >>>>>> >> >> > owning organizations, I think it is a bit risky to
> >> depend on
> >> >>>>>> those
> >> >>>>>> >> long
> >> >>>>>> >> >> > term and I think we should aim to bring all those
> >> "vulnerable"
> >> >>>>>> >> >> dependencies
> >> >>>>>> >> >> > into community control.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > I reviewed all our code (or I think all !) looking for
> such
> >> >>>>>> >> dependencies
> >> >>>>>> >> >> > and prepared an "umbrella" issue where I proposed the
> >> approach
> >> >>>>>> >> we can
> >> >>>>>> >> >> take
> >> >>>>>> >> >> > for all such dependencies.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > I could have missed some - so if you find others feel
> >> >>>>>> free to
> >> >>>>>> >> comment/add
> >> >>>>>> >> >> > the new ones.
> >> >>>>>> >> >> > All the details are captured here:
> >> >>>>>> >> >> > https://github.com/apache/airflow/issues/9401 - I
> discussed
> >> >>>>>> the
> >> >>>>>> >> >> > context/motivation/current status and approach we can
> >> >>>>>> take for
> >> >>>>>> those
> >> >>>>>> >> >> > dependencies.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > A lot of those dependencies just need review and maybe
> some
> >> >>>>>> >> updates to
> >> >>>>>> >> >> > latest versions. And I do not think there is a lot to
> >> discuss
> >> >>>>>> for
> >> >>>>>> >> those.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > There is one point, however, that requires more
> deliberate
> >> >>>>>> >> action and
> >> >>>>>> >> >> some
> >> >>>>>> >> >> > decisions I think.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > We have some dependencies on Docker images that we are
> using
> >> >>>>>> from
> >> >>>>>> >> various
> >> >>>>>> >> >> > sources:
> >> >>>>>> >> >> > 1) officially maintained images
> >> >>>>>> >> >> > 2) images released by organizations that released them
> for
> >> >>>>>> their own
> >> >>>>>> >> >> > purpose, but they are not "officially maintained" by
> those
> >> >>>>>> >> organizations
> >> >>>>>> >> >> > 3) images released by private individuals
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should
> >> >>>>>> bring the
> >> >>>>>> >> >> images
> >> >>>>>> >> >> > to Airflow community management. Here is the list of
> those
> >> >>>>>> >> images I
> >> >>>>>> >> found
> >> >>>>>> >> >> > that need to be moved to Airflow:
> >> >>>>>> >> >> >
> >> >>>>>> >> >> >   - aneeshkj/helm-unittest
> >> >>>>>> >> >> >   - ashb/apache-rat:0.13-1
> >> >>>>>> >> >> >   - godatadriven/krb5-kdc-server
> >> >>>>>> >> >> >   - polinux/stress (?)
> >> >>>>>> >> >> >   - osixia/openldap:1.2.0
> >> >>>>>> >> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
> >> >>>>>> >> >> >   - astronomerinc/ap-pgbouncer:1.8.1
> >> >>>>>> >> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
> >> >>>>>> >> >> >
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > *Proposal*:
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > My proposal is to make a folder in our repository on
> Github
> >> >>>>>> (continue
> >> >>>>>> >> >> with
> >> >>>>>> >> >> > the mono-repo approach we follow) to keep corresponding
> >> >>>>>> Dockerfiles
> >> >>>>>> >> and
> >> >>>>>> >> >> > scripts that build and release images from there. Now the
> >> only
> >> >>>>>> >> >> > question is
> >> >>>>>> >> >> > where to keep those images. We currently have
> apache/airflow
> >> >>>>>> but I
> >> >>>>>> >> >> > think we
> >> >>>>>> >> >> > should reserve it for airflow images only and we
> >> should keep
> >> >>>>>> those
> >> >>>>>> >> images
> >> >>>>>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images"
> >> of any
> >> >>>>>> >> sort in
> >> >>>>>> >> >> > DockerHub. We are already abusing a bit the
> "apache/airflow"
> >> >>>>>> >> >> namespace as
> >> >>>>>> >> >> > we are keeping both CI and production images there (but
> >> that's
> >> >>>>>> quite
> >> >>>>>> >> >> > OK as
> >> >>>>>> >> >> > the images are similar).
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > My proposal will be to create an* "apache/airflow-ext"*
> >> >>>>>> DockerHub
> >> >>>>>> >> >> > repository and keep the images there. They will also
> >> be a
> >> >>>>>> little
> >> >>>>>> >> >> > abused because we will have to name them with tags - for
> >> >>>>>> example:
> >> >>>>>> >> >> >
> >> >>>>>> >> >> >   - apache/airflow-ext:helm-unittest-[version]
> >> >>>>>> >> >> >   - apache/airflow-ext:apache-rat-[version]
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > I am also open to other names for the repo and proposals
> >> other
> >> >>>>>> ways
> >> >>>>>> >> >> > how to
> >> >>>>>> >> >> > handle that.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > I believe there is no issue with Licences for either of
> >> those
> >> >>>>>> images
> >> >>>>>> >> >> (Ash,
> >> >>>>>> >> >> > Kaxil, Fokko - some of the images are
> >> >>>>>> Astronomer's/GoDataDriven's
> >> >>>>>> >> >> ones -
> >> >>>>>> >> >> > can you comment on that ?)  but I believe licensing on
> all
> >> >>>>>> those
> >> >>>>>> >> >> > images are
> >> >>>>>> >> >> > ok for us to copy with attribution (I will
> >> double-check that
> >> >>>>>> for other
> >> >>>>>> >> >> > images).
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > WDYT?
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > J.
> >> >>>>>> >> >> >
> >> >>>>>> >> >> >
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > --
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > Jarek Potiuk
> >> >>>>>> >> >> > Polidea <https://www.polidea.com/> | Principal Software
> >> >>>>>> Engineer
> >> >>>>>> >> >> >
> >> >>>>>> >> >> > M: +48 660 796 129 <+48660796129>
> >> >>>>>> >> >> > [image: Polidea] <https://www.polidea.com/>
> >> >>>>>> >> >> >
> >> >>>>>> >> >>
> >> >>>>>> >> >
> >> >>>>>> >>
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > --
> >> >>>>>> >
> >> >>>>>> > Jarek Potiuk
> >> >>>>>> > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> >> >>>>>> >
> >> >>>>>> > M: +48 660 796 129 <+48660796129>
> >> >>>>>> > [image: Polidea] <https://www.polidea.com/>
> >> >>>>>> >
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>>
> >> >>>>> Jarek Potiuk
> >> >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >>>>>
> >> >>>>> M: +48 660 796 129 <+48660796129>
> >> >>>>> [image: Polidea] <https://www.polidea.com/>
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Jarek Potiuk
> >> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >>>>
> >> >>>> M: +48 660 796 129 <+48660796129>
> >> >>>> [image: Polidea] <https://www.polidea.com/>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>> Jarek Potiuk
> >> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >>>
> >> >>> M: +48 660 796 129 <+48660796129>
> >> >>> [image: Polidea] <https://www.polidea.com/>
> >> >>>
> >> >>>
> >> >>
> >> >> --
> >> >>
> >> >> Jarek Potiuk
> >> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >>
> >> >> M: +48 660 796 129 <+48660796129>
> >> >> [image: Polidea] <https://www.polidea.com/>
> >> >>
> >> >>
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >
> >> > M: +48 660 796 129 <+48660796129>
> >> > [image: Polidea] <https://www.polidea.com/>
> >> >
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to