Yeah I figured that from looking at the commits -- but I think even if
it was an proper fork I wouldn't be a fan of this approach: we'd have
too keep "porting"/merging our changes to update from upstream.

-ash

On Jul 6 2020, at 1:36 pm, Jarek Potiuk <[email protected]> wrote:

> Sure - we could do that as well if we agree on that.
>  
> Just to explain - the repository is really a "fork" of the original one
> with our modifications on top. The only reason it's not an "actual" github
> fork was that I cannot do a fork in "apache" organisation.
>  
> J.
>  
>  
> On Mon, Jul 6, 2020 at 2:22 PM Ash Berlin-Taylor <[email protected]> wrote:
>  
>> I've just taken a look at the
>> https://github.com/apache/airflow-pgbouncer-exporter (I'm assuming the
>> others are the same) and "woah, wait" was my reaction.
>>  
>> Having a repo where we include the Dockerfile and build scripts: I'm
>> okay with that.
>>  
>> This approach where we have an entire copy of the code and have
>> essentially forked the the upstream project: not happy verging on a
>> -1/veto of this approach.
>>  
>> I.e. I'd prefer this repo was just a Dockerfile that pulls the upstream
>> project from a published release/git tag/pinned commit sha.
>>  
>> -ash
>>  
>> On Jul 6 2020, at 12:46 pm, Jarek Potiuk <[email protected]> wrote:
>>  
>> > One more comment. I started the discussion in the build devlist of
>> Apache:
>> >
>> https://lists.apache.org/thread.html/rf2af2a95e7687fe94ede23fe9df388f784c8231a5968b109f677cbe8%40%3Cbuilds.apache.org%3E
>> > - and so far there are no conclusive answers. Iy is something that
>> is not
>> > regulated clearly by ASF rules it seems,
>> >
>> > So seems to me we are free to choose what our approach is (for now):
>> >
>> > But I have found this at least:
>> >
>> > https://www.apache.org/legal/release-policy.html#what
>> >
>> > "The Apache Software Foundation produces open source software. All
>> releases
>> > are in the form of the source materials needed to make changes to the
>> > software being released. In some cases, binary/bytecode packages
>> are also
>> > produced as a convenience to users that might not have the appropriate
>> > tools to build a compiled version of the source. In all such cases, the
>> > binary/bytecode package must have the same version number as the source
>> > release and may only add binary/bytecode files that are the result of
>> > compiling that version of the source code release."
>> >
>> > I think "the spirit" of that chapter is something that I am referring
>> > to -
>> > from the beginning of the thread.
>> >
>> > I really think if we give our users a convenient way of using some binary
>> > packages (i.e. docker images) there should be an easy way to reproduce
>> > those from sources. I have the feeling that my proposal is simply an
>> > embodiment of that rule. Glad to hear what other think about it. I am
>> fully
>> > aware it is a "gray" area, but I think with a very little cost we can
>> move
>> > it to the "white" area.
>> >
>> > J.
>> >
>> >
>> >
>> > On Sun, Jul 5, 2020 at 11:42 AM Jarek Potiuk <[email protected]>
>> > wrote:
>> >
>> >> Hello Everyone,
>> >>
>> >> TL;DR: I did some experiments with those images and I have a
>> proposal on
>> >> how we can handle that. I have a workable proposal.
>> >>
>> >> I already created a few repos to see how it can work and I think I
>> >> have a
>> >> workable and rather easy to maintain the solution. We can still
>> >> delete this
>> >> if we choose another way, of course, I just wanted to make sure all
>> below
>> >> is "workable" and I simply implemented a complete, working solution.
>> It's
>> >> not as complex, but it's good I was doing it - I found a few
>> things that
>> >> had to be fixed in Dockerfiles and build scripts provided by upstream
>> >> repos, I also made sure that we are using the latest patched
>> versions of
>> >> all the tools. In all cases we can rebuild everything from sources -
>> >> we do
>> >> not have to rely on some binary that we trust was build from the sources
>> >> (other than official images)..
>> >>
>> >> Happy to hear any comments, but I propose that if the below looks
>> >> good to
>> >> you, we get a lazy consensus and I simply implement and document
>> it. I
>> >> would also make it a rule for our images that we keep that
>> approach for
>> >> future images.
>> >>
>> >> *More details:*
>> >>
>> >> 1) I brought all the images to "apache/airlfow" DockerHub
>> registry: both
>> >> dev images and the ones used in the chart. I tried to have a
>> >> separate "airflowdev" user but it turns out to be not really good
>> - it's
>> >> either one-user account or organization with up to three people for
>> free.
>> >> That would be a bit hassle with 2-factor authentication etc. to
>> >> manage it.
>> >> I think it's actually quite good to have
>> >> "apache/airflow:helm-unittest-2020.07.10-v0.2.0-v3.1.2". image. Docker
>> >> works well in this setup and I think it's rather nice to have all the
>> >> images in one registry.
>> >>
>> >> 2) we have three more repos where I cloned the code for those images
>> that
>> >> required "whole" repo and made them standalone - i.e. depending
>> only on
>> >> official images/binaries released by organizations "owning" the
>> code in
>> >> questions and the code that is officially released in the official
>> >> "apt" or
>> >> "apk" (alpine) repositories). I made some airflow specific modifications
>> >> there (labels, maintainer, sometimes some configuration changes, build
>> >> scripts). Those changes are merged as separate commits - we should be
>> able
>> >> to bring upstream changes from those repos rather easily if we want.
>> Those
>> >> are the repos:
>> >>
>> >> * https://github.com/apache/airflow-pgbouncer-exporter
>> >> * https://github.com/apache/airflow-openldap
>> >> * https://github.com/apache/airflow-helm-unittest
>> >>
>> >> 3) Those images that did not require a whole separate repository, I
>> >> created scripts/Dockerfile folders in those two PRs: "chart/dockerfiles
>> >> <https://github.com/apache/airflow/pull/9650>" directory for "helm"
>> >> images and "scripts/ci/dockerfiles
>> >> <https://github.com/apache/airflow/pull/9652>" for CI images.
>> >>
>> >> 4) All the images are based either on "alpine" or "debian-slim" or
>> >> "ubuntu-slim" images and they are optimized for size.
>> >>
>> >> 5) All the images keep similar naming conventions and have similar build
>> >> scripts that you can simply run to rebuild the images from scratch
>> (bumping
>> >> the versions, bringing upstream changes before as needed). An example
>> build
>> >> script is below. It will be very easy to upgrade those images as
>> >> needed and
>> >> release them separately or all at the same time. Example naming
>> convention:
>> >>
>> >> *apache/airflow:airflow-pgbouncer-2020.07.10-1.14.0*
>> >>
>> >> Legend:
>> >>
>> >> * *pgbouncer* image released by airflow
>> >> * *1.14.0* - version of pgbouncer
>> >> * *2020.07.10* - calver version of the image (roughly - the time when
>> the
>> >> image was released/created by Airflow)
>> >>
>> >>
>> >> 6) All images have a consistent labeling scheme - including commit SHA
>> >> used to generate the image:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> *            "Labels": {
>> >> "org.apache.airflow.airflow_pgbouncer.version": "2020.07.10",
>> >>   "org.apache.airflow.commit_sha":
>> >> "43e6406a84d2589bd54c3c37ceaa0c3ebaa9de26",
>> >> "org.apache.airflow.component": "pgbouncer",
>> >> "org.apache.airflow.pgbouncer.version": "1.14.0"            }*
>> >>
>> >>
>> >> 7) No regular maintenance is needed for CI images - we can bump them
>> from
>> >> time to time on an ad-hoc basis or when we need to increase
>> version. For
>> >> Helm images I think we should release new versions of those images every
>> >> time we release Helm chart - we can then rebuild the images using the
>> >> latest patches of debian/alpine and latest versions of the software
>> >> we have
>> >> in them.
>> >>
>> >> 8) Example build script
>> >>
>> >> #!/usr/bin/env bash
>> >> # Licensed to the Apache Software Foundation (ASF) under one
>> >> # ... licence here
>> >> set -euo pipefail
>> >> DOCKERHUB_USER=${DOCKERHUB_USER:="apache"}
>> >> DOCKERHUB_REPO=${DOCKERHUB_REPO:="airflow"}
>> >> PGBOUNCER_VERSION="1.14.0"
>> >> AIRFLOW_PGBOUNCER_VERSION="2020.07.10"
>> >> COMMIT_SHA=$(git rev-parse HEAD)
>> >>
>> >> cd "$( dirname "${BASH_SOURCE[0]}" )" || exit 1
>> >>
>> >>
>> >>
>> TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:airflow-pgbouncer-${AIRFLOW_PGBOUNCER_VERSION}-${PGBOUNCER_VERSION}"
>> >>
>> >> docker build . \
>> >>     --pull \
>> >>     --build-arg "PGBOUNCER_VERSION=${PGBOUNCER_VERSION}" \
>> >>     --build-arg
>> "AIRFLOW_PGBOUNCER_VERSION=${AIRFLOW_PGBOUNCER_VERSION}"\
>> >>     --build-arg "COMMIT_SHA=${COMMIT_SHA}" \
>> >>     --tag "${TAG}"
>> >>
>> >> docker push "${TAG}"
>> >>
>> >>
>> >> J.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Jul 2, 2020 at 2:12 PM Jarek Potiuk <[email protected]>
>> >> wrote:
>> >>
>> >>> And the right Greg here :(,
>> >>>
>> >>> J.
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <[email protected]
>> >
>> >>> wrote:
>> >>>
>> >>>> Hey Ash, Greg, Daniel,
>> >>>>
>> >>>> So I understand there is no problem with licenses for those
>> images and
>> >>>> we can get/use the sources for those?
>> >>>>
>> >>>> I would love to add the scripts/Dockerfiles to the sources - to be
>> able
>> >>>> to rebuild the images. I have some of those already and would like
>> >>>> to make
>> >>>> a  PR, but It would be great if we can get the Dockerfile sources.
>> >>>> I also
>> >>>> want to ask a few questions about versions of the base images (some
>> >>>> of the
>> >>>> base images seem to be quite old and there are newer releases so I
>> wanted
>> >>>> to check if there is anything to prevent upgrading them).
>> >>>>
>> >>>> J
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk <
>> [email protected]>
>> >>>> wrote:
>> >>>>
>> >>>>>
>> >>>>> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <[email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> > - apache/airflow:statstd-exporter-2020.6.31
>> >>>>>> > - apache/airflow:pgbouncer-2020.6.31
>> >>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31
>> >>>>>
>> >>>>> Do we count these as "releases" (i.e. do the PMC need to vote on
>> them)
>> >>>>>> or not?
>> >>>>>>
>> >>>>>
>> >>>>> I think we should. I believe we should make it a part of regular
>> >>>>> release and vote together on "airflow + prod image + helm + dependent
>> >>>>> images".
>> >>>>> Then we might release each of those separately if needed -  with
>> >>>>> separate voting/process (possibly we can bundle together several
>> different
>> >>>>> things to release). Hence CalVer might make more sense even if we
>> release
>> >>>>> them together with 1.10.x or 2.Y (especially that those deps are
>> pretty
>> >>>>> much independent from the airflow version used). I think for
>> >>>>> Airflow + Prod
>> >>>>> image, it makes perfect sense to keep 1.10.* 2.0.* - but for
>> Helm and
>> >>>>> dependent images - CalVer seems like a better idea.
>> >>>>>
>> >>>>>
>> >>>>> For these I think including the upstream version is useful too
>> (either
>> >>>>>> as well, or instead) -- that way people can look at the right
>> version
>> >>>>>> of
>> >>>>>> the upstream docs when looking at what configuration options
>> >>>>>> there are.
>> >>>>>> so `apache/airflow:pgbouncer-1.8.1-1` or
>> >>>>>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D )
>> >>>>>>
>> >>>>>
>> >>>>> Agree. BTW. I wondered if anyone notices the date ;).
>> >>>>>
>> >>>>> (FYI For pgbouncer-exporter there are three such projects on github,
>> >>>>>> Juraj's was picked somewhat randomly)
>> >>>>>>
>> >>>>>> > I think now it's the matter of just following up with the
>> >>>>>> > releases of pgbouncer and libressl and libressl-dev
>> >>>>>>
>> >>>>>> That's still a fairly big "just". And there ssl libraries
>> aren't the
>> >>>>>> only sources of security patches needed. Also the act of
>> updating is
>> >>>>>> the
>> >>>>>> easy part -- its the notification to know when updates are
>> >>>>>> needed, and
>> >>>>>> ensuring that they happen in a timely manner that is the hard
>> >>>>>> part :)
>> >>>>>>
>> >>>>>
>> >>>>> True. But I think we have some precedent in our CI/Prod images. We
>> have
>> >>>>> it currently automated so that they self-maintain ad self-upgrade:
>> >>>>> https://github.com/apache/airflow/blob/master/CI.rst. The
>> current CI
>> >>>>> automation is done in the way that we are catching up fairly
>> >>>>> quickly with
>> >>>>> the latest python patches - almost without noticing (well there is
>> >>>>> a few
>> >>>>> hours period where the builds on CI get slower and people need to
>> update
>> >>>>> their Breeze images). But other than that it happens automatically
>> and
>> >>>>> without anyone doing any active work there.
>> >>>>>
>> >>>>> I can do a very similar approach for all the images (both dev and
>> >>>>> runtime) and add a notification component to notify if any of the
>> >>>>> upstreaming deps changes. So it will be - from our side - mostly
>> deciding
>> >>>>> if we should release it out-of-the-bands or wait for "regular"
>> release.
>> >>>>>
>> >>>>> J.
>> >>>>>
>> >>>>>
>> >>>>>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <[email protected]
>> >
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> > I think  I'd feel more comfortable if we have it all under
>> >>>>>> "community"
>> >>>>>> > umbrella.
>> >>>>>> >
>> >>>>>> >   - For dev images - I think we have a good idea from
>> couchdb. I
>> >>>>>> will make
>> >>>>>> >   a POC of that and PR shortly. I already created airflowdev
>> account
>> >>>>>> on
>> >>>>>> >   Dockerhub and make it available to PMCs of Airlfow and
>> >>>>>> connect it
>> >>>>>> to our
>> >>>>>> >   repo to automate Dev dependencies.
>> >>>>>> >   - For the runtime (astronomer) images I took a deeper look
>> >>>>>> and I
>> >>>>>> think
>> >>>>>> >   it makes perfect sense to add them and release by Airflow
>> Community
>> >>>>>> > as well:
>> >>>>>> >
>> >>>>>> > Here is what is in those images:
>> >>>>>> >
>> >>>>>> >   - astronomerinc/ap-statsd-exporter
>> >>>>>> >   <
>> >>>>>>
>> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore
>> >>>>>> >
>> >>>>>> >   - this image is just based on the official Prometheus Statsd
>> >>>>>> > exported with
>> >>>>>> >   added file "/etc/statsd-exporter/mappings.yml". So the
>> maintenance
>> >>>>>> is
>> >>>>>> >   mainly about keeping the mapping and possibly upgrade to lates
>> >>>>>> released
>> >>>>>> >   prometheus-statsd occasionally. The first one sounds like
>> a good
>> >>>>>> > idea for
>> >>>>>> >   community work, the second we can easily automate - same way
>> >>>>>> as we
>> >>>>>> > do for
>> >>>>>> >   production images. Seems that this one is updated once
>> every few
>> >>>>>> > months, so
>> >>>>>> >   we can easily do that. astronomerinc/ap-pgbouncer:latest
>> >>>>>> >   - astronomerinc/ap-pgbouncer
>> >>>>>> >   <
>> >>>>>>
>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore
>> >>>>>> >
>> >>>>>> >   - this is just packaging pgbouncer into an image - this one
>> seems
>> >>>>>> to be
>> >>>>>> >   updated more frequently in the past but I think now it's the
>> matter
>> >>>>>> > of just
>> >>>>>> >   following up with the releases of pgbouncer and libressl and
>> >>>>>> lbressl-dev
>> >>>>>> >
>> >>>>>> >   <
>> >>>>>>
>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
>> >>>>>> >
>> >>>>>> >   - astronomerinc/ap-pgbouncer-exporter
>> >>>>>> >   <
>> >>>>>>
>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
>> >>>>>> >
>> >>>>>> >   - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer
>> >>>>>> Prometheus
>> >>>>>> >   exporter with libressl and libressl-dev library upgraded. Also
>> >>>>>> usually
>> >>>>>> >   updated every few months. Here I think it would also make
>> >>>>>> sense to
>> >>>>>> bring
>> >>>>>> >   the source code in to the community for Juraj's image as well.
>> >>>>>> >
>> >>>>>> > I also think it would make sense (unlike the dev
>> dependencies) to
>> >>>>>> publish
>> >>>>>> > all "runtime" devs under the "apache/airflow" repository. That
>> would
>> >>>>>> > be a
>> >>>>>> > bit awkward, but I think it's the least "effort" we need to
>> maintain
>> >>>>>> and
>> >>>>>> > make sure it is officially "blessed" during the release.
>> >>>>>> >
>> >>>>>> > So the proposal I have (if we use calver versioning similar to
>> >>>>>> backport
>> >>>>>> > packages):
>> >>>>>> >
>> >>>>>> >   - apache/airflow:statstd-exporter-2020.6.31
>> >>>>>> >   - apache/airflow:pgbouncer-2020.6.31
>> >>>>>> >   - apache/airflow:pgbouncer-exporter-2020.6.31
>> >>>>>> >
>> >>>>>> > I am happy to bring it all to our repo and setup automation.
>> >>>>>> >
>> >>>>>> > J.
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <
>> [email protected]>
>> >>>>>> wrote:
>> >>>>>> >
>> >>>>>> >> Wow Kamil that's an awesome and mature processs for a
>> company to
>> >>>>>> take --
>> >>>>>> >> I wish more companies treated open source deps that way.
>> >>>>>> >>
>> >>>>>> >> As I mentioned in the original Helm PR (but just in a comment
>> left
>> >>>>>> to a
>> >>>>>> >> review), I left a few of the "support" Docker images as
>> >>>>>> astronomerinc
>> >>>>>> >> ones as the upstream Docker images are "unmaintained" (that isn't
>> >>>>>> to say
>> >>>>>> >> the projects are, just that the images aren't re-published
>> in a
>> >>>>>> timely
>> >>>>>> >> fashion to update openssl etc.)
>> >>>>>> >>
>> >>>>>> >> I am happy to replace the astronomerinc support images with
>> others
>> >>>>>> if we
>> >>>>>> >> want to. I am also happy to clarify/make explicit the license
>> >>>>>> situation
>> >>>>>> >> that those images are distributed under (Apache 2) if we
>> want to
>> >>>>>> stick
>> >>>>>> >> with them and let us (Astronomer) carry the burden of patching
>> and
>> >>>>>> >> updating them -- it is after all part of what people pay us
>> >>>>>> for so
>> >>>>>> we'll
>> >>>>>> >> be doing it anyway.
>> >>>>>> >>
>> >>>>>> >> > Besides, we should provide the possibility to replace "Object
>> >>>>>> code" with
>> >>>>>> >> > other objects i.e., use of an image from a private third-party
>> >>>>>> registry.
>> >>>>>> >>
>> >>>>>> >> The images to use come from the helm values, so are easily
>> >>>>>> changable at
>> >>>>>> >> helm install/upgrade time:
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>>
>> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
>> >>>>>> >>
>> >>>>>> >> -ash
>> >>>>>> >>
>> >>>>>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła <
>> >>>>>> [email protected]>
>> >>>>>> >> wrote:
>> >>>>>> >>
>> >>>>>> >> > These files have no information to determine the license.
>> >>>>>> In my
>> >>>>>> opinion,
>> >>>>>> >> > these images ("Derivative Works") should be treated as
>> >>>>>> Astronomer's or
>> >>>>>> >> > other users' copyrighted files. Please note that
>> Astronomer may
>> >>>>>> >> distribute
>> >>>>>> >> > the images under a different license, but they need to
>> >>>>>> acknowledge the
>> >>>>>> >> use
>> >>>>>> >> > of the Foundation or other licensed software. To do otherwise
>> >>>>>> would be
>> >>>>>> >> > stealing.
>> >>>>>> >> >
>> >>>>>> >> > DockerHub is not an Open Source software registry, and we
>> cannot
>> >>>>>> assume
>> >>>>>> >> > that every image there is available under a license that allows
>> >>>>>> >> free use.
>> >>>>>> >> >
>> >>>>>> >> > **What does this mean for the project?**
>> >>>>>> >> >
>> >>>>>> >> > This is incompatible with the Apache license because each
>> runtime
>> >>>>>> >> > dependencies must also be based on the Apache-compatible
>> license.
>> >>>>>> These
>> >>>>>> >> > images are required to run the Helm Chart, so are its
>> dependencies
>> >>>>>> >> > Dependencies that are not compatible with the Apache license
>> >>>>>> are a
>> >>>>>> >> problem
>> >>>>>> >> > for our users and prevent the use of this project.
>> >>>>>> >> >
>> >>>>>> >> > **How do we deal with this topic in my organization?**
>> >>>>>> >> >
>> >>>>>> >> > We take the topic of copyright very seriously in my
>> organization.
>> >>>>>> >> One of
>> >>>>>> >> > the steps we take before publishing a derivative work based
>> >>>>>> on an
>> >>>>>> >> > Open-Source license is to audit the source code to see if each
>> >>>>>> part is
>> >>>>>> >> > under a license that allows us to use it. If we build
>> images or
>> >>>>>> artifacts
>> >>>>>> >> > automatically, we take steps that prevent the accidental
>> >>>>>> publication
>> >>>>>> >> > of an
>> >>>>>> >> > artifact that could contain works that have an incorrect
>> license.
>> >>>>>> >> >
>> >>>>>> >> > We do this by building the audited internal registry:
>> >>>>>> >> > - In the case of Airflow, this is a copy of the source
>> code and
>> >>>>>> the
>> >>>>>> >> > necessary PIP libraries stored in the blockchain-based registry
>> >>>>>> >> > (append-only registry). Any change in such a registry
>> >>>>>> undergoes a
>> >>>>>> review
>> >>>>>> >> > process and must be approved. It is not possible to
>> revert an
>> >>>>>> approved
>> >>>>>> >> > change without leaving a trace.
>> >>>>>> >> > - In the case of Docker images, this means that each
>> image is
>> >>>>>> built
>> >>>>>> >> > automatically, and no one publishes the images to images
>> register
>> >>>>>> >> manually
>> >>>>>> >> > (docker push). No step can download files from a registry
>> >>>>>> that is
>> >>>>>> not
>> >>>>>> >> > auditable.
>> >>>>>> >> >
>> >>>>>> >> > Such steps allow you to recreate the software development
>> process,
>> >>>>>> >> > e.g. in
>> >>>>>> >> > the case of a court case.
>> >>>>>> >> >
>> >>>>>> >> > In our case, it won't be easy to introduce all similar
>> >>>>>> requirements,
>> >>>>>> >> > but we
>> >>>>>> >> > can try to be compatible with them so that organizations that
>> >>>>>> have the
>> >>>>>> >> same
>> >>>>>> >> > requirements can meet them.
>> >>>>>> >> >
>> >>>>>> >> > **What should we do?**
>> >>>>>> >> >
>> >>>>>> >> > In my opinion, this is similar to using libraries in our
>> >>>>>> application.
>> >>>>>> >> > We do
>> >>>>>> >> > not perform a publisher assessment for every library we
>> use. We
>> >>>>>> only
>> >>>>>> >> verify
>> >>>>>> >> > license compliance.
>> >>>>>> >> >
>> >>>>>> >> > On the other hand, it looks different because it is "Object
>> >>>>>> Code", not
>> >>>>>> >> > "Source Code". We do not use source code directly, but we
>> >>>>>> use an
>> >>>>>> object
>> >>>>>> >> > prepared by a third party - "Derivative Works".
>> >>>>>> >> >
>> >>>>>> >> > In my opinion, relying on any Docker image ("Object Code")
>> >>>>>> is OK
>> >>>>>> if they
>> >>>>>> >> > meet the following requirements:
>> >>>>>> >> > - The Source Code required to create the object should be
>> publicly
>> >>>>>> >> > available and should be compatible with the Apache license.
>> >>>>>> >> > - We should have s access to Compilation Information. The
>> >>>>>> Compilation
>> >>>>>> >> > Information must suffice to ensure that the continued
>> functioning
>> >>>>>> >> of the
>> >>>>>> >> > source code is in no case prevented or interfered with solely
>> >>>>>> because
>> >>>>>> >> > modification has been made.
>> >>>>>> >> >
>> >>>>>> >> > Besides, we should provide the possibility to replace "Object
>> >>>>>> code" with
>> >>>>>> >> > other objects i.e., use of an image from a private third-party
>> >>>>>> registry.
>> >>>>>> >> >
>> >>>>>> >> > Thank Jarek for paying attention to this issue.  I didn't think
>> >>>>>> >> about it
>> >>>>>> >> > before, but now I know I couldn't use the Helm Chart in its
>> >>>>>> current
>> >>>>>> >> > form in
>> >>>>>> >> > any of my work. I am afraid that many members of our community
>> >>>>>> >> would face
>> >>>>>> >> > similar problems if they tried to use it in a production
>> >>>>>> environment.
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <
>> [email protected]
>> >>>>>> >
>> >>>>>> >> wrote:
>> >>>>>> >> >
>> >>>>>> >> >> Licensing wise there is no issue from me: The astronomerinc
>> >>>>>> images are
>> >>>>>> >> >> just re-packaging of the upstream images to apply security
>> fixes
>> >>>>>> >> so are
>> >>>>>> >> >> licensed under whatever the original image is (MIT or Apache2
>> >>>>>> usually,
>> >>>>>> >> >> else we wouldn't have put them in the helm chart PR)
>> >>>>>> >> >>
>> >>>>>> >> >> For background, the reason that we at Astronomer created
>> >>>>>> >> >> ap-pgbouncer-exporter in the first place is that the upstream
>> >>>>>> package
>> >>>>>> >> >> does not patch/rebuild to address security
>> vulnerabilities. By
>> >>>>>> taking
>> >>>>>> >> >> this in to airflow-ext it means we as a project become
>> >>>>>> responsible for
>> >>>>>> >> >> monitoring and testing that. (And don't be fooled in to
>> thinking
>> >>>>>> the
>> >>>>>> >> >> free scanners can detect all vulns here, we've found them
>> >>>>>> to be
>> >>>>>> >> very of
>> >>>>>> >> >> variable, and questionable accuracy.)
>> >>>>>> >> >>
>> >>>>>> >> >> That is a non-trivial amount of work for an open source
>> project.
>> >>>>>> >> >>
>> >>>>>> >> >> Has this ever caused us any problems outside of Pip/python
>> >>>>>> dependencies?
>> >>>>>> >> >> (I'm not aware of any.) For runtime this maybe makes sense
>> >>>>>> (again, I'm
>> >>>>>> >> >> not yet convinced), but for test-only/dev-only deps this seems
>> >>>>>> >> like a
>> >>>>>> >> >> lot of work that we could better spend on working on
>> >>>>>> Airflow. If
>> >>>>>> >> we pin
>> >>>>>> >> >> versions of docker image used then the only real risk is a
>> >>>>>> left-pad
>> >>>>>> >> >> scenario of "I'm deleting all my images" which is a minor
>> risk.
>> >>>>>> >> >>
>> >>>>>> >> >> Do any other project do anything like this? I haven't
>> seen it
>> >>>>>> before.
>> >>>>>> >> >>
>> >>>>>> >> >> I'd vote for doing nothing and addressing this in specific
>> cases
>> >>>>>> >> when it
>> >>>>>> >> >> becomes a problem. Because I do not see using thidy party
>> docker
>> >>>>>> images
>> >>>>>> >> >> as a risk. I see it as a time saving measure.
>> >>>>>> >> >>
>> >>>>>> >> >> -ash
>> >>>>>> >> >>
>> >>>>>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <
>> >>>>>> [email protected]>
>> >>>>>> >> wrote:
>> >>>>>> >> >>
>> >>>>>> >> >> > Hello everyone,
>> >>>>>> >> >> >
>> >>>>>> >> >> > TL;DR; I noticed that we are accumulating some
>> >>>>>> dependencies to
>> >>>>>> >> external
>> >>>>>> >> >> > binaries (downloads and Docker images) which make the Apache
>> >>>>>> Airflow
>> >>>>>> >> >> > Community a bit vulnerable to external dependencies.  I
>> would
>> >>>>>> love
>> >>>>>> >> your
>> >>>>>> >> >> > comments/opinions on the proposal I made around this.
>> >>>>>> >> >> >
>> >>>>>> >> >> > *More explanation/status:*
>> >>>>>> >> >> >
>> >>>>>> >> >> > While dependence is fine for officially "released" and
>> >>>>>> "managed" by
>> >>>>>> >> the
>> >>>>>> >> >> > owning organizations, I think it is a bit risky to
>> depend on
>> >>>>>> those
>> >>>>>> >> long
>> >>>>>> >> >> > term and I think we should aim to bring all those
>> "vulnerable"
>> >>>>>> >> >> dependencies
>> >>>>>> >> >> > into community control.
>> >>>>>> >> >> >
>> >>>>>> >> >> > I reviewed all our code (or I think all !) looking for such
>> >>>>>> >> dependencies
>> >>>>>> >> >> > and prepared an "umbrella" issue where I proposed the
>> approach
>> >>>>>> >> we can
>> >>>>>> >> >> take
>> >>>>>> >> >> > for all such dependencies.
>> >>>>>> >> >> >
>> >>>>>> >> >> > I could have missed some - so if you find others feel
>> >>>>>> free to
>> >>>>>> >> comment/add
>> >>>>>> >> >> > the new ones.
>> >>>>>> >> >> > All the details are captured here:
>> >>>>>> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed
>> >>>>>> the
>> >>>>>> >> >> > context/motivation/current status and approach we can
>> >>>>>> take for
>> >>>>>> those
>> >>>>>> >> >> > dependencies.
>> >>>>>> >> >> >
>> >>>>>> >> >> > A lot of those dependencies just need review and maybe some
>> >>>>>> >> updates to
>> >>>>>> >> >> > latest versions. And I do not think there is a lot to
>> discuss
>> >>>>>> for
>> >>>>>> >> those.
>> >>>>>> >> >> >
>> >>>>>> >> >> > There is one point, however, that requires more deliberate
>> >>>>>> >> action and
>> >>>>>> >> >> some
>> >>>>>> >> >> > decisions I think.
>> >>>>>> >> >> >
>> >>>>>> >> >> > We have some dependencies on Docker images that we are using
>> >>>>>> from
>> >>>>>> >> various
>> >>>>>> >> >> > sources:
>> >>>>>> >> >> > 1) officially maintained images
>> >>>>>> >> >> > 2) images released by organizations that released them for
>> >>>>>> their own
>> >>>>>> >> >> > purpose, but they are not "officially maintained" by those
>> >>>>>> >> organizations
>> >>>>>> >> >> > 3) images released by private individuals
>> >>>>>> >> >> >
>> >>>>>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should
>> >>>>>> bring the
>> >>>>>> >> >> images
>> >>>>>> >> >> > to Airflow community management. Here is the list of those
>> >>>>>> >> images I
>> >>>>>> >> found
>> >>>>>> >> >> > that need to be moved to Airflow:
>> >>>>>> >> >> >
>> >>>>>> >> >> >   - aneeshkj/helm-unittest
>> >>>>>> >> >> >   - ashb/apache-rat:0.13-1
>> >>>>>> >> >> >   - godatadriven/krb5-kdc-server
>> >>>>>> >> >> >   - polinux/stress (?)
>> >>>>>> >> >> >   - osixia/openldap:1.2.0
>> >>>>>> >> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
>> >>>>>> >> >> >   - astronomerinc/ap-pgbouncer:1.8.1
>> >>>>>> >> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
>> >>>>>> >> >> >
>> >>>>>> >> >> >
>> >>>>>> >> >> > *Proposal*:
>> >>>>>> >> >> >
>> >>>>>> >> >> > My proposal is to make a folder in our repository on Github
>> >>>>>> (continue
>> >>>>>> >> >> with
>> >>>>>> >> >> > the mono-repo approach we follow) to keep corresponding
>> >>>>>> Dockerfiles
>> >>>>>> >> and
>> >>>>>> >> >> > scripts that build and release images from there. Now the
>> only
>> >>>>>> >> >> > question is
>> >>>>>> >> >> > where to keep those images. We currently have apache/airflow
>> >>>>>> but I
>> >>>>>> >> >> > think we
>> >>>>>> >> >> > should reserve it for airflow images only and we
>> should keep
>> >>>>>> those
>> >>>>>> >> images
>> >>>>>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images"
>> of any
>> >>>>>> >> sort in
>> >>>>>> >> >> > DockerHub. We are already abusing a bit the "apache/airflow"
>> >>>>>> >> >> namespace as
>> >>>>>> >> >> > we are keeping both CI and production images there (but
>> that's
>> >>>>>> quite
>> >>>>>> >> >> > OK as
>> >>>>>> >> >> > the images are similar).
>> >>>>>> >> >> >
>> >>>>>> >> >> > My proposal will be to create an* "apache/airflow-ext"*
>> >>>>>> DockerHub
>> >>>>>> >> >> > repository and keep the images there. They will also
>> be a
>> >>>>>> little
>> >>>>>> >> >> > abused because we will have to name them with tags - for
>> >>>>>> example:
>> >>>>>> >> >> >
>> >>>>>> >> >> >   - apache/airflow-ext:helm-unittest-[version]
>> >>>>>> >> >> >   - apache/airflow-ext:apache-rat-[version]
>> >>>>>> >> >> >
>> >>>>>> >> >> > I am also open to other names for the repo and proposals
>> other
>> >>>>>> ways
>> >>>>>> >> >> > how to
>> >>>>>> >> >> > handle that.
>> >>>>>> >> >> >
>> >>>>>> >> >> > I believe there is no issue with Licences for either of
>> those
>> >>>>>> images
>> >>>>>> >> >> (Ash,
>> >>>>>> >> >> > Kaxil, Fokko - some of the images are
>> >>>>>> Astronomer's/GoDataDriven's
>> >>>>>> >> >> ones -
>> >>>>>> >> >> > can you comment on that ?)  but I believe licensing on all
>> >>>>>> those
>> >>>>>> >> >> > images are
>> >>>>>> >> >> > ok for us to copy with attribution (I will
>> double-check that
>> >>>>>> for other
>> >>>>>> >> >> > images).
>> >>>>>> >> >> >
>> >>>>>> >> >> > WDYT?
>> >>>>>> >> >> >
>> >>>>>> >> >> > J.
>> >>>>>> >> >> >
>> >>>>>> >> >> >
>> >>>>>> >> >> >
>> >>>>>> >> >> > --
>> >>>>>> >> >> >
>> >>>>>> >> >> > Jarek Potiuk
>> >>>>>> >> >> > Polidea <https://www.polidea.com/> | Principal Software
>> >>>>>> Engineer
>> >>>>>> >> >> >
>> >>>>>> >> >> > M: +48 660 796 129 <+48660796129>
>> >>>>>> >> >> > [image: Polidea] <https://www.polidea.com/>
>> >>>>>> >> >> >
>> >>>>>> >> >>
>> >>>>>> >> >
>> >>>>>> >>
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > --
>> >>>>>> >
>> >>>>>> > Jarek Potiuk
>> >>>>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>>> >
>> >>>>>> > M: +48 660 796 129 <+48660796129>
>> >>>>>> > [image: Polidea] <https://www.polidea.com/>
>> >>>>>> >
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Jarek Potiuk
>> >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>>
>> >>>>> M: +48 660 796 129 <+48660796129>
>> >>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Jarek Potiuk
>> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>
>> >>>> M: +48 660 796 129 <+48660796129>
>> >>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>>
>> >>> Jarek Potiuk
>> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>
>> >>> M: +48 660 796 129 <+48660796129>
>> >>> [image: Polidea] <https://www.polidea.com/>
>> >>>
>> >>>
>> >>
>> >> --
>> >>
>> >> Jarek Potiuk
>> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>
>> >> M: +48 660 796 129 <+48660796129>
>> >> [image: Polidea] <https://www.polidea.com/>
>> >>
>> >>
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>  
>  
>  
> --  
>  
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>  
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to