Sure - we could do that as well if we agree on that.

Just to explain - the repository is really a "fork" of the original one
with our modifications on top. The only reason it's not an "actual" github
fork was that I cannot do a fork in "apache" organisation.

J.


On Mon, Jul 6, 2020 at 2:22 PM Ash Berlin-Taylor <[email protected]> wrote:

> I've just taken a look at the
> https://github.com/apache/airflow-pgbouncer-exporter (I'm assuming the
> others are the same) and "woah, wait" was my reaction.
>
> Having a repo where we include the Dockerfile and build scripts: I'm
> okay with that.
>
> This approach where we have an entire copy of the code and have
> essentially forked the the upstream project: not happy verging on a
> -1/veto of this approach.
>
> I.e. I'd prefer this repo was just a Dockerfile that pulls the upstream
> project from a published release/git tag/pinned commit sha.
>
> -ash
>
> On Jul 6 2020, at 12:46 pm, Jarek Potiuk <[email protected]> wrote:
>
> > One more comment. I started the discussion in the build devlist of
> Apache:
> >
> https://lists.apache.org/thread.html/rf2af2a95e7687fe94ede23fe9df388f784c8231a5968b109f677cbe8%40%3Cbuilds.apache.org%3E
> > - and so far there are no conclusive answers. Iy is something that is not
> > regulated clearly by ASF rules it seems,
> >
> > So seems to me we are free to choose what our approach is (for now):
> >
> > But I have found this at least:
> >
> > https://www.apache.org/legal/release-policy.html#what
> >
> > "The Apache Software Foundation produces open source software. All
> releases
> > are in the form of the source materials needed to make changes to the
> > software being released. In some cases, binary/bytecode packages are also
> > produced as a convenience to users that might not have the appropriate
> > tools to build a compiled version of the source. In all such cases, the
> > binary/bytecode package must have the same version number as the source
> > release and may only add binary/bytecode files that are the result of
> > compiling that version of the source code release."
> >
> > I think "the spirit" of that chapter is something that I am referring
> > to -
> > from the beginning of the thread.
> >
> > I really think if we give our users a convenient way of using some binary
> > packages (i.e. docker images) there should be an easy way to reproduce
> > those from sources. I have the feeling that my proposal is simply an
> > embodiment of that rule. Glad to hear what other think about it. I am
> fully
> > aware it is a "gray" area, but I think with a very little cost we can
> move
> > it to the "white" area.
> >
> > J.
> >
> >
> >
> > On Sun, Jul 5, 2020 at 11:42 AM Jarek Potiuk <[email protected]>
> > wrote:
> >
> >> Hello Everyone,
> >>
> >> TL;DR: I did some experiments with those images and I have a proposal on
> >> how we can handle that. I have a workable proposal.
> >>
> >> I already created a few repos to see how it can work and I think I
> >> have a
> >> workable and rather easy to maintain the solution. We can still
> >> delete this
> >> if we choose another way, of course, I just wanted to make sure all
> below
> >> is "workable" and I simply implemented a complete, working solution.
> It's
> >> not as complex, but it's good I was doing it - I found a few things that
> >> had to be fixed in Dockerfiles and build scripts provided by upstream
> >> repos, I also made sure that we are using the latest patched versions of
> >> all the tools. In all cases we can rebuild everything from sources -
> >> we do
> >> not have to rely on some binary that we trust was build from the sources
> >> (other than official images)..
> >>
> >> Happy to hear any comments, but I propose that if the below looks
> >> good to
> >> you, we get a lazy consensus and I simply implement and document it. I
> >> would also make it a rule for our images that we keep that approach for
> >> future images.
> >>
> >> *More details:*
> >>
> >> 1) I brought all the images to "apache/airlfow" DockerHub registry: both
> >> dev images and the ones used in the chart. I tried to have a
> >> separate "airflowdev" user but it turns out to be not really good - it's
> >> either one-user account or organization with up to three people for
> free.
> >> That would be a bit hassle with 2-factor authentication etc. to
> >> manage it.
> >> I think it's actually quite good to have
> >> "apache/airflow:helm-unittest-2020.07.10-v0.2.0-v3.1.2". image. Docker
> >> works well in this setup and I think it's rather nice to have all the
> >> images in one registry.
> >>
> >> 2) we have three more repos where I cloned the code for those images
> that
> >> required "whole" repo and made them standalone - i.e. depending only on
> >> official images/binaries released by organizations "owning" the code in
> >> questions and the code that is officially released in the official
> >> "apt" or
> >> "apk" (alpine) repositories). I made some airflow specific modifications
> >> there (labels, maintainer, sometimes some configuration changes, build
> >> scripts). Those changes are merged as separate commits - we should be
> able
> >> to bring upstream changes from those repos rather easily if we want.
> Those
> >> are the repos:
> >>
> >> * https://github.com/apache/airflow-pgbouncer-exporter
> >> * https://github.com/apache/airflow-openldap
> >> * https://github.com/apache/airflow-helm-unittest
> >>
> >> 3) Those images that did not require a whole separate repository, I
> >> created scripts/Dockerfile folders in those two PRs: "chart/dockerfiles
> >> <https://github.com/apache/airflow/pull/9650>" directory for "helm"
> >> images and "scripts/ci/dockerfiles
> >> <https://github.com/apache/airflow/pull/9652>" for CI images.
> >>
> >> 4) All the images are based either on "alpine" or "debian-slim" or
> >> "ubuntu-slim" images and they are optimized for size.
> >>
> >> 5) All the images keep similar naming conventions and have similar build
> >> scripts that you can simply run to rebuild the images from scratch
> (bumping
> >> the versions, bringing upstream changes before as needed). An example
> build
> >> script is below. It will be very easy to upgrade those images as
> >> needed and
> >> release them separately or all at the same time. Example naming
> convention:
> >>
> >> *apache/airflow:airflow-pgbouncer-2020.07.10-1.14.0*
> >>
> >> Legend:
> >>
> >> * *pgbouncer* image released by airflow
> >> * *1.14.0* - version of pgbouncer
> >> * *2020.07.10* - calver version of the image (roughly - the time when
> the
> >> image was released/created by Airflow)
> >>
> >>
> >> 6) All images have a consistent labeling scheme - including commit SHA
> >> used to generate the image:
> >>
> >>
> >>
> >>
> >>
> >>
> >> *            "Labels": {
> >> "org.apache.airflow.airflow_pgbouncer.version": "2020.07.10",
> >>   "org.apache.airflow.commit_sha":
> >> "43e6406a84d2589bd54c3c37ceaa0c3ebaa9de26",
> >> "org.apache.airflow.component": "pgbouncer",
> >> "org.apache.airflow.pgbouncer.version": "1.14.0"            }*
> >>
> >>
> >> 7) No regular maintenance is needed for CI images - we can bump them
> from
> >> time to time on an ad-hoc basis or when we need to increase version. For
> >> Helm images I think we should release new versions of those images every
> >> time we release Helm chart - we can then rebuild the images using the
> >> latest patches of debian/alpine and latest versions of the software
> >> we have
> >> in them.
> >>
> >> 8) Example build script
> >>
> >> #!/usr/bin/env bash
> >> # Licensed to the Apache Software Foundation (ASF) under one
> >> # ... licence here
> >> set -euo pipefail
> >> DOCKERHUB_USER=${DOCKERHUB_USER:="apache"}
> >> DOCKERHUB_REPO=${DOCKERHUB_REPO:="airflow"}
> >> PGBOUNCER_VERSION="1.14.0"
> >> AIRFLOW_PGBOUNCER_VERSION="2020.07.10"
> >> COMMIT_SHA=$(git rev-parse HEAD)
> >>
> >> cd "$( dirname "${BASH_SOURCE[0]}" )" || exit 1
> >>
> >>
> >>
> TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:airflow-pgbouncer-${AIRFLOW_PGBOUNCER_VERSION}-${PGBOUNCER_VERSION}"
> >>
> >> docker build . \
> >>     --pull \
> >>     --build-arg "PGBOUNCER_VERSION=${PGBOUNCER_VERSION}" \
> >>     --build-arg
> "AIRFLOW_PGBOUNCER_VERSION=${AIRFLOW_PGBOUNCER_VERSION}"\
> >>     --build-arg "COMMIT_SHA=${COMMIT_SHA}" \
> >>     --tag "${TAG}"
> >>
> >> docker push "${TAG}"
> >>
> >>
> >> J.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jul 2, 2020 at 2:12 PM Jarek Potiuk <[email protected]>
> >> wrote:
> >>
> >>> And the right Greg here :(,
> >>>
> >>> J.
> >>>
> >>>
> >>>
> >>> On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <[email protected]
> >
> >>> wrote:
> >>>
> >>>> Hey Ash, Greg, Daniel,
> >>>>
> >>>> So I understand there is no problem with licenses for those images and
> >>>> we can get/use the sources for those?
> >>>>
> >>>> I would love to add the scripts/Dockerfiles to the sources - to be
> able
> >>>> to rebuild the images. I have some of those already and would like
> >>>> to make
> >>>> a  PR, but It would be great if we can get the Dockerfile sources.
> >>>> I also
> >>>> want to ask a few questions about versions of the base images (some
> >>>> of the
> >>>> base images seem to be quite old and there are newer releases so I
> wanted
> >>>> to check if there is anything to prevent upgrading them).
> >>>>
> >>>> J
> >>>>
> >>>>
> >>>> On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk <
> [email protected]>
> >>>> wrote:
> >>>>
> >>>>>
> >>>>> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <[email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> > - apache/airflow:statstd-exporter-2020.6.31
> >>>>>> > - apache/airflow:pgbouncer-2020.6.31
> >>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31
> >>>>>
> >>>>> Do we count these as "releases" (i.e. do the PMC need to vote on
> them)
> >>>>>> or not?
> >>>>>>
> >>>>>
> >>>>> I think we should. I believe we should make it a part of regular
> >>>>> release and vote together on "airflow + prod image + helm + dependent
> >>>>> images".
> >>>>> Then we might release each of those separately if needed -  with
> >>>>> separate voting/process (possibly we can bundle together several
> different
> >>>>> things to release). Hence CalVer might make more sense even if we
> release
> >>>>> them together with 1.10.x or 2.Y (especially that those deps are
> pretty
> >>>>> much independent from the airflow version used). I think for
> >>>>> Airflow + Prod
> >>>>> image, it makes perfect sense to keep 1.10.* 2.0.* - but for Helm and
> >>>>> dependent images - CalVer seems like a better idea.
> >>>>>
> >>>>>
> >>>>> For these I think including the upstream version is useful too
> (either
> >>>>>> as well, or instead) -- that way people can look at the right
> version
> >>>>>> of
> >>>>>> the upstream docs when looking at what configuration options
> >>>>>> there are.
> >>>>>> so `apache/airflow:pgbouncer-1.8.1-1` or
> >>>>>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D )
> >>>>>>
> >>>>>
> >>>>> Agree. BTW. I wondered if anyone notices the date ;).
> >>>>>
> >>>>> (FYI For pgbouncer-exporter there are three such projects on github,
> >>>>>> Juraj's was picked somewhat randomly)
> >>>>>>
> >>>>>> > I think now it's the matter of just following up with the
> >>>>>> > releases of pgbouncer and libressl and libressl-dev
> >>>>>>
> >>>>>> That's still a fairly big "just". And there ssl libraries aren't the
> >>>>>> only sources of security patches needed. Also the act of updating is
> >>>>>> the
> >>>>>> easy part -- its the notification to know when updates are
> >>>>>> needed, and
> >>>>>> ensuring that they happen in a timely manner that is the hard
> >>>>>> part :)
> >>>>>>
> >>>>>
> >>>>> True. But I think we have some precedent in our CI/Prod images. We
> have
> >>>>> it currently automated so that they self-maintain ad self-upgrade:
> >>>>> https://github.com/apache/airflow/blob/master/CI.rst. The current CI
> >>>>> automation is done in the way that we are catching up fairly
> >>>>> quickly with
> >>>>> the latest python patches - almost without noticing (well there is
> >>>>> a few
> >>>>> hours period where the builds on CI get slower and people need to
> update
> >>>>> their Breeze images). But other than that it happens automatically
> and
> >>>>> without anyone doing any active work there.
> >>>>>
> >>>>> I can do a very similar approach for all the images (both dev and
> >>>>> runtime) and add a notification component to notify if any of the
> >>>>> upstreaming deps changes. So it will be - from our side - mostly
> deciding
> >>>>> if we should release it out-of-the-bands or wait for "regular"
> release.
> >>>>>
> >>>>> J.
> >>>>>
> >>>>>
> >>>>>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <[email protected]
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>> > I think  I'd feel more comfortable if we have it all under
> >>>>>> "community"
> >>>>>> > umbrella.
> >>>>>> >
> >>>>>> >   - For dev images - I think we have a good idea from couchdb. I
> >>>>>> will make
> >>>>>> >   a POC of that and PR shortly. I already created airflowdev
> account
> >>>>>> on
> >>>>>> >   Dockerhub and make it available to PMCs of Airlfow and
> >>>>>> connect it
> >>>>>> to our
> >>>>>> >   repo to automate Dev dependencies.
> >>>>>> >   - For the runtime (astronomer) images I took a deeper look
> >>>>>> and I
> >>>>>> think
> >>>>>> >   it makes perfect sense to add them and release by Airflow
> Community
> >>>>>> > as well:
> >>>>>> >
> >>>>>> > Here is what is in those images:
> >>>>>> >
> >>>>>> >   - astronomerinc/ap-statsd-exporter
> >>>>>> >   <
> >>>>>>
> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore
> >>>>>> >
> >>>>>> >   - this image is just based on the official Prometheus Statsd
> >>>>>> > exported with
> >>>>>> >   added file "/etc/statsd-exporter/mappings.yml". So the
> maintenance
> >>>>>> is
> >>>>>> >   mainly about keeping the mapping and possibly upgrade to lates
> >>>>>> released
> >>>>>> >   prometheus-statsd occasionally. The first one sounds like a good
> >>>>>> > idea for
> >>>>>> >   community work, the second we can easily automate - same way
> >>>>>> as we
> >>>>>> > do for
> >>>>>> >   production images. Seems that this one is updated once every few
> >>>>>> > months, so
> >>>>>> >   we can easily do that. astronomerinc/ap-pgbouncer:latest
> >>>>>> >   - astronomerinc/ap-pgbouncer
> >>>>>> >   <
> >>>>>>
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore
> >>>>>> >
> >>>>>> >   - this is just packaging pgbouncer into an image - this one
> seems
> >>>>>> to be
> >>>>>> >   updated more frequently in the past but I think now it's the
> matter
> >>>>>> > of just
> >>>>>> >   following up with the releases of pgbouncer and libressl and
> >>>>>> lbressl-dev
> >>>>>> >
> >>>>>> >   <
> >>>>>>
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
> >>>>>> >
> >>>>>> >   - astronomerinc/ap-pgbouncer-exporter
> >>>>>> >   <
> >>>>>>
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
> >>>>>> >
> >>>>>> >   - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer
> >>>>>> Prometheus
> >>>>>> >   exporter with libressl and libressl-dev library upgraded. Also
> >>>>>> usually
> >>>>>> >   updated every few months. Here I think it would also make
> >>>>>> sense to
> >>>>>> bring
> >>>>>> >   the source code in to the community for Juraj's image as well.
> >>>>>> >
> >>>>>> > I also think it would make sense (unlike the dev dependencies) to
> >>>>>> publish
> >>>>>> > all "runtime" devs under the "apache/airflow" repository. That
> would
> >>>>>> > be a
> >>>>>> > bit awkward, but I think it's the least "effort" we need to
> maintain
> >>>>>> and
> >>>>>> > make sure it is officially "blessed" during the release.
> >>>>>> >
> >>>>>> > So the proposal I have (if we use calver versioning similar to
> >>>>>> backport
> >>>>>> > packages):
> >>>>>> >
> >>>>>> >   - apache/airflow:statstd-exporter-2020.6.31
> >>>>>> >   - apache/airflow:pgbouncer-2020.6.31
> >>>>>> >   - apache/airflow:pgbouncer-exporter-2020.6.31
> >>>>>> >
> >>>>>> > I am happy to bring it all to our repo and setup automation.
> >>>>>> >
> >>>>>> > J.
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <
> [email protected]>
> >>>>>> wrote:
> >>>>>> >
> >>>>>> >> Wow Kamil that's an awesome and mature processs for a company to
> >>>>>> take --
> >>>>>> >> I wish more companies treated open source deps that way.
> >>>>>> >>
> >>>>>> >> As I mentioned in the original Helm PR (but just in a comment
> left
> >>>>>> to a
> >>>>>> >> review), I left a few of the "support" Docker images as
> >>>>>> astronomerinc
> >>>>>> >> ones as the upstream Docker images are "unmaintained" (that isn't
> >>>>>> to say
> >>>>>> >> the projects are, just that the images aren't re-published in a
> >>>>>> timely
> >>>>>> >> fashion to update openssl etc.)
> >>>>>> >>
> >>>>>> >> I am happy to replace the astronomerinc support images with
> others
> >>>>>> if we
> >>>>>> >> want to. I am also happy to clarify/make explicit the license
> >>>>>> situation
> >>>>>> >> that those images are distributed under (Apache 2) if we want to
> >>>>>> stick
> >>>>>> >> with them and let us (Astronomer) carry the burden of patching
> and
> >>>>>> >> updating them -- it is after all part of what people pay us
> >>>>>> for so
> >>>>>> we'll
> >>>>>> >> be doing it anyway.
> >>>>>> >>
> >>>>>> >> > Besides, we should provide the possibility to replace "Object
> >>>>>> code" with
> >>>>>> >> > other objects i.e., use of an image from a private third-party
> >>>>>> registry.
> >>>>>> >>
> >>>>>> >> The images to use come from the helm values, so are easily
> >>>>>> changable at
> >>>>>> >> helm install/upgrade time:
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
> >>>>>>
> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
> >>>>>> >>
> >>>>>> >> -ash
> >>>>>> >>
> >>>>>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła <
> >>>>>> [email protected]>
> >>>>>> >> wrote:
> >>>>>> >>
> >>>>>> >> > These files have no information to determine the license.
> >>>>>> In my
> >>>>>> opinion,
> >>>>>> >> > these images ("Derivative Works") should be treated as
> >>>>>> Astronomer's or
> >>>>>> >> > other users' copyrighted files. Please note that Astronomer may
> >>>>>> >> distribute
> >>>>>> >> > the images under a different license, but they need to
> >>>>>> acknowledge the
> >>>>>> >> use
> >>>>>> >> > of the Foundation or other licensed software. To do otherwise
> >>>>>> would be
> >>>>>> >> > stealing.
> >>>>>> >> >
> >>>>>> >> > DockerHub is not an Open Source software registry, and we
> cannot
> >>>>>> assume
> >>>>>> >> > that every image there is available under a license that allows
> >>>>>> >> free use.
> >>>>>> >> >
> >>>>>> >> > **What does this mean for the project?**
> >>>>>> >> >
> >>>>>> >> > This is incompatible with the Apache license because each
> runtime
> >>>>>> >> > dependencies must also be based on the Apache-compatible
> license.
> >>>>>> These
> >>>>>> >> > images are required to run the Helm Chart, so are its
> dependencies
> >>>>>> >> > Dependencies that are not compatible with the Apache license
> >>>>>> are a
> >>>>>> >> problem
> >>>>>> >> > for our users and prevent the use of this project.
> >>>>>> >> >
> >>>>>> >> > **How do we deal with this topic in my organization?**
> >>>>>> >> >
> >>>>>> >> > We take the topic of copyright very seriously in my
> organization.
> >>>>>> >> One of
> >>>>>> >> > the steps we take before publishing a derivative work based
> >>>>>> on an
> >>>>>> >> > Open-Source license is to audit the source code to see if each
> >>>>>> part is
> >>>>>> >> > under a license that allows us to use it. If we build images or
> >>>>>> artifacts
> >>>>>> >> > automatically, we take steps that prevent the accidental
> >>>>>> publication
> >>>>>> >> > of an
> >>>>>> >> > artifact that could contain works that have an incorrect
> license.
> >>>>>> >> >
> >>>>>> >> > We do this by building the audited internal registry:
> >>>>>> >> > - In the case of Airflow, this is a copy of the source code and
> >>>>>> the
> >>>>>> >> > necessary PIP libraries stored in the blockchain-based registry
> >>>>>> >> > (append-only registry). Any change in such a registry
> >>>>>> undergoes a
> >>>>>> review
> >>>>>> >> > process and must be approved. It is not possible to revert an
> >>>>>> approved
> >>>>>> >> > change without leaving a trace.
> >>>>>> >> > - In the case of Docker images, this means that each image is
> >>>>>> built
> >>>>>> >> > automatically, and no one publishes the images to images
> register
> >>>>>> >> manually
> >>>>>> >> > (docker push). No step can download files from a registry
> >>>>>> that is
> >>>>>> not
> >>>>>> >> > auditable.
> >>>>>> >> >
> >>>>>> >> > Such steps allow you to recreate the software development
> process,
> >>>>>> >> > e.g. in
> >>>>>> >> > the case of a court case.
> >>>>>> >> >
> >>>>>> >> > In our case, it won't be easy to introduce all similar
> >>>>>> requirements,
> >>>>>> >> > but we
> >>>>>> >> > can try to be compatible with them so that organizations that
> >>>>>> have the
> >>>>>> >> same
> >>>>>> >> > requirements can meet them.
> >>>>>> >> >
> >>>>>> >> > **What should we do?**
> >>>>>> >> >
> >>>>>> >> > In my opinion, this is similar to using libraries in our
> >>>>>> application.
> >>>>>> >> > We do
> >>>>>> >> > not perform a publisher assessment for every library we use. We
> >>>>>> only
> >>>>>> >> verify
> >>>>>> >> > license compliance.
> >>>>>> >> >
> >>>>>> >> > On the other hand, it looks different because it is "Object
> >>>>>> Code", not
> >>>>>> >> > "Source Code". We do not use source code directly, but we
> >>>>>> use an
> >>>>>> object
> >>>>>> >> > prepared by a third party - "Derivative Works".
> >>>>>> >> >
> >>>>>> >> > In my opinion, relying on any Docker image ("Object Code")
> >>>>>> is OK
> >>>>>> if they
> >>>>>> >> > meet the following requirements:
> >>>>>> >> > - The Source Code required to create the object should be
> publicly
> >>>>>> >> > available and should be compatible with the Apache license.
> >>>>>> >> > - We should have s access to Compilation Information. The
> >>>>>> Compilation
> >>>>>> >> > Information must suffice to ensure that the continued
> functioning
> >>>>>> >> of the
> >>>>>> >> > source code is in no case prevented or interfered with solely
> >>>>>> because
> >>>>>> >> > modification has been made.
> >>>>>> >> >
> >>>>>> >> > Besides, we should provide the possibility to replace "Object
> >>>>>> code" with
> >>>>>> >> > other objects i.e., use of an image from a private third-party
> >>>>>> registry.
> >>>>>> >> >
> >>>>>> >> > Thank Jarek for paying attention to this issue.  I didn't think
> >>>>>> >> about it
> >>>>>> >> > before, but now I know I couldn't use the Helm Chart in its
> >>>>>> current
> >>>>>> >> > form in
> >>>>>> >> > any of my work. I am afraid that many members of our community
> >>>>>> >> would face
> >>>>>> >> > similar problems if they tried to use it in a production
> >>>>>> environment.
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <
> [email protected]
> >>>>>> >
> >>>>>> >> wrote:
> >>>>>> >> >
> >>>>>> >> >> Licensing wise there is no issue from me: The astronomerinc
> >>>>>> images are
> >>>>>> >> >> just re-packaging of the upstream images to apply security
> fixes
> >>>>>> >> so are
> >>>>>> >> >> licensed under whatever the original image is (MIT or Apache2
> >>>>>> usually,
> >>>>>> >> >> else we wouldn't have put them in the helm chart PR)
> >>>>>> >> >>
> >>>>>> >> >> For background, the reason that we at Astronomer created
> >>>>>> >> >> ap-pgbouncer-exporter in the first place is that the upstream
> >>>>>> package
> >>>>>> >> >> does not patch/rebuild to address security vulnerabilities. By
> >>>>>> taking
> >>>>>> >> >> this in to airflow-ext it means we as a project become
> >>>>>> responsible for
> >>>>>> >> >> monitoring and testing that. (And don't be fooled in to
> thinking
> >>>>>> the
> >>>>>> >> >> free scanners can detect all vulns here, we've found them
> >>>>>> to be
> >>>>>> >> very of
> >>>>>> >> >> variable, and questionable accuracy.)
> >>>>>> >> >>
> >>>>>> >> >> That is a non-trivial amount of work for an open source
> project.
> >>>>>> >> >>
> >>>>>> >> >> Has this ever caused us any problems outside of Pip/python
> >>>>>> dependencies?
> >>>>>> >> >> (I'm not aware of any.) For runtime this maybe makes sense
> >>>>>> (again, I'm
> >>>>>> >> >> not yet convinced), but for test-only/dev-only deps this seems
> >>>>>> >> like a
> >>>>>> >> >> lot of work that we could better spend on working on
> >>>>>> Airflow. If
> >>>>>> >> we pin
> >>>>>> >> >> versions of docker image used then the only real risk is a
> >>>>>> left-pad
> >>>>>> >> >> scenario of "I'm deleting all my images" which is a minor
> risk.
> >>>>>> >> >>
> >>>>>> >> >> Do any other project do anything like this? I haven't seen it
> >>>>>> before.
> >>>>>> >> >>
> >>>>>> >> >> I'd vote for doing nothing and addressing this in specific
> cases
> >>>>>> >> when it
> >>>>>> >> >> becomes a problem. Because I do not see using thidy party
> docker
> >>>>>> images
> >>>>>> >> >> as a risk. I see it as a time saving measure.
> >>>>>> >> >>
> >>>>>> >> >> -ash
> >>>>>> >> >>
> >>>>>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <
> >>>>>> [email protected]>
> >>>>>> >> wrote:
> >>>>>> >> >>
> >>>>>> >> >> > Hello everyone,
> >>>>>> >> >> >
> >>>>>> >> >> > TL;DR; I noticed that we are accumulating some
> >>>>>> dependencies to
> >>>>>> >> external
> >>>>>> >> >> > binaries (downloads and Docker images) which make the Apache
> >>>>>> Airflow
> >>>>>> >> >> > Community a bit vulnerable to external dependencies.  I
> would
> >>>>>> love
> >>>>>> >> your
> >>>>>> >> >> > comments/opinions on the proposal I made around this.
> >>>>>> >> >> >
> >>>>>> >> >> > *More explanation/status:*
> >>>>>> >> >> >
> >>>>>> >> >> > While dependence is fine for officially "released" and
> >>>>>> "managed" by
> >>>>>> >> the
> >>>>>> >> >> > owning organizations, I think it is a bit risky to depend on
> >>>>>> those
> >>>>>> >> long
> >>>>>> >> >> > term and I think we should aim to bring all those
> "vulnerable"
> >>>>>> >> >> dependencies
> >>>>>> >> >> > into community control.
> >>>>>> >> >> >
> >>>>>> >> >> > I reviewed all our code (or I think all !) looking for such
> >>>>>> >> dependencies
> >>>>>> >> >> > and prepared an "umbrella" issue where I proposed the
> approach
> >>>>>> >> we can
> >>>>>> >> >> take
> >>>>>> >> >> > for all such dependencies.
> >>>>>> >> >> >
> >>>>>> >> >> > I could have missed some - so if you find others feel
> >>>>>> free to
> >>>>>> >> comment/add
> >>>>>> >> >> > the new ones.
> >>>>>> >> >> > All the details are captured here:
> >>>>>> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed
> >>>>>> the
> >>>>>> >> >> > context/motivation/current status and approach we can
> >>>>>> take for
> >>>>>> those
> >>>>>> >> >> > dependencies.
> >>>>>> >> >> >
> >>>>>> >> >> > A lot of those dependencies just need review and maybe some
> >>>>>> >> updates to
> >>>>>> >> >> > latest versions. And I do not think there is a lot to
> discuss
> >>>>>> for
> >>>>>> >> those.
> >>>>>> >> >> >
> >>>>>> >> >> > There is one point, however, that requires more deliberate
> >>>>>> >> action and
> >>>>>> >> >> some
> >>>>>> >> >> > decisions I think.
> >>>>>> >> >> >
> >>>>>> >> >> > We have some dependencies on Docker images that we are using
> >>>>>> from
> >>>>>> >> various
> >>>>>> >> >> > sources:
> >>>>>> >> >> > 1) officially maintained images
> >>>>>> >> >> > 2) images released by organizations that released them for
> >>>>>> their own
> >>>>>> >> >> > purpose, but they are not "officially maintained" by those
> >>>>>> >> organizations
> >>>>>> >> >> > 3) images released by private individuals
> >>>>>> >> >> >
> >>>>>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should
> >>>>>> bring the
> >>>>>> >> >> images
> >>>>>> >> >> > to Airflow community management. Here is the list of those
> >>>>>> >> images I
> >>>>>> >> found
> >>>>>> >> >> > that need to be moved to Airflow:
> >>>>>> >> >> >
> >>>>>> >> >> >   - aneeshkj/helm-unittest
> >>>>>> >> >> >   - ashb/apache-rat:0.13-1
> >>>>>> >> >> >   - godatadriven/krb5-kdc-server
> >>>>>> >> >> >   - polinux/stress (?)
> >>>>>> >> >> >   - osixia/openldap:1.2.0
> >>>>>> >> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
> >>>>>> >> >> >   - astronomerinc/ap-pgbouncer:1.8.1
> >>>>>> >> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
> >>>>>> >> >> >
> >>>>>> >> >> >
> >>>>>> >> >> > *Proposal*:
> >>>>>> >> >> >
> >>>>>> >> >> > My proposal is to make a folder in our repository on Github
> >>>>>> (continue
> >>>>>> >> >> with
> >>>>>> >> >> > the mono-repo approach we follow) to keep corresponding
> >>>>>> Dockerfiles
> >>>>>> >> and
> >>>>>> >> >> > scripts that build and release images from there. Now the
> only
> >>>>>> >> >> > question is
> >>>>>> >> >> > where to keep those images. We currently have apache/airflow
> >>>>>> but I
> >>>>>> >> >> > think we
> >>>>>> >> >> > should reserve it for airflow images only and we should keep
> >>>>>> those
> >>>>>> >> images
> >>>>>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any
> >>>>>> >> sort in
> >>>>>> >> >> > DockerHub. We are already abusing a bit the "apache/airflow"
> >>>>>> >> >> namespace as
> >>>>>> >> >> > we are keeping both CI and production images there (but
> that's
> >>>>>> quite
> >>>>>> >> >> > OK as
> >>>>>> >> >> > the images are similar).
> >>>>>> >> >> >
> >>>>>> >> >> > My proposal will be to create an* "apache/airflow-ext"*
> >>>>>> DockerHub
> >>>>>> >> >> > repository and keep the images there. They will also be a
> >>>>>> little
> >>>>>> >> >> > abused because we will have to name them with tags - for
> >>>>>> example:
> >>>>>> >> >> >
> >>>>>> >> >> >   - apache/airflow-ext:helm-unittest-[version]
> >>>>>> >> >> >   - apache/airflow-ext:apache-rat-[version]
> >>>>>> >> >> >
> >>>>>> >> >> > I am also open to other names for the repo and proposals
> other
> >>>>>> ways
> >>>>>> >> >> > how to
> >>>>>> >> >> > handle that.
> >>>>>> >> >> >
> >>>>>> >> >> > I believe there is no issue with Licences for either of
> those
> >>>>>> images
> >>>>>> >> >> (Ash,
> >>>>>> >> >> > Kaxil, Fokko - some of the images are
> >>>>>> Astronomer's/GoDataDriven's
> >>>>>> >> >> ones -
> >>>>>> >> >> > can you comment on that ?)  but I believe licensing on all
> >>>>>> those
> >>>>>> >> >> > images are
> >>>>>> >> >> > ok for us to copy with attribution (I will double-check that
> >>>>>> for other
> >>>>>> >> >> > images).
> >>>>>> >> >> >
> >>>>>> >> >> > WDYT?
> >>>>>> >> >> >
> >>>>>> >> >> > J.
> >>>>>> >> >> >
> >>>>>> >> >> >
> >>>>>> >> >> >
> >>>>>> >> >> > --
> >>>>>> >> >> >
> >>>>>> >> >> > Jarek Potiuk
> >>>>>> >> >> > Polidea <https://www.polidea.com/> | Principal Software
> >>>>>> Engineer
> >>>>>> >> >> >
> >>>>>> >> >> > M: +48 660 796 129 <+48660796129>
> >>>>>> >> >> > [image: Polidea] <https://www.polidea.com/>
> >>>>>> >> >> >
> >>>>>> >> >>
> >>>>>> >> >
> >>>>>> >>
> >>>>>> >
> >>>>>> >
> >>>>>> > --
> >>>>>> >
> >>>>>> > Jarek Potiuk
> >>>>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>> >
> >>>>>> > M: +48 660 796 129 <+48660796129>
> >>>>>> > [image: Polidea] <https://www.polidea.com/>
> >>>>>> >
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Jarek Potiuk
> >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>
> >>>>> M: +48 660 796 129 <+48660796129>
> >>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>>
> >>>> Jarek Potiuk
> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>
> >>>> M: +48 660 796 129 <+48660796129>
> >>>> [image: Polidea] <https://www.polidea.com/>
> >>>>
> >>>>
> >>>
> >>> --
> >>>
> >>> Jarek Potiuk
> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>
> >>> M: +48 660 796 129 <+48660796129>
> >>> [image: Polidea] <https://www.polidea.com/>
> >>>
> >>>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to