I've just taken a look at the
https://github.com/apache/airflow-pgbouncer-exporter (I'm assuming the
others are the same) and "woah, wait" was my reaction.

Having a repo where we include the Dockerfile and build scripts: I'm
okay with that.

This approach where we have an entire copy of the code and have
essentially forked the the upstream project: not happy verging on a
-1/veto of this approach.

I.e. I'd prefer this repo was just a Dockerfile that pulls the upstream
project from a published release/git tag/pinned commit sha.

-ash

On Jul 6 2020, at 12:46 pm, Jarek Potiuk <[email protected]> wrote:

> One more comment. I started the discussion in the build devlist of Apache:
> https://lists.apache.org/thread.html/rf2af2a95e7687fe94ede23fe9df388f784c8231a5968b109f677cbe8%40%3Cbuilds.apache.org%3E
> - and so far there are no conclusive answers. Iy is something that is not
> regulated clearly by ASF rules it seems,
>  
> So seems to me we are free to choose what our approach is (for now):
>  
> But I have found this at least:
>  
> https://www.apache.org/legal/release-policy.html#what
>  
> "The Apache Software Foundation produces open source software. All releases
> are in the form of the source materials needed to make changes to the
> software being released. In some cases, binary/bytecode packages are also
> produced as a convenience to users that might not have the appropriate
> tools to build a compiled version of the source. In all such cases, the
> binary/bytecode package must have the same version number as the source
> release and may only add binary/bytecode files that are the result of
> compiling that version of the source code release."
>  
> I think "the spirit" of that chapter is something that I am referring
> to -
> from the beginning of the thread.
>  
> I really think if we give our users a convenient way of using some binary
> packages (i.e. docker images) there should be an easy way to reproduce
> those from sources. I have the feeling that my proposal is simply an
> embodiment of that rule. Glad to hear what other think about it. I am fully
> aware it is a "gray" area, but I think with a very little cost we can move
> it to the "white" area.
>  
> J.
>  
>  
>  
> On Sun, Jul 5, 2020 at 11:42 AM Jarek Potiuk <[email protected]>
> wrote:
>  
>> Hello Everyone,
>>  
>> TL;DR: I did some experiments with those images and I have a proposal on
>> how we can handle that. I have a workable proposal.
>>  
>> I already created a few repos to see how it can work and I think I
>> have a
>> workable and rather easy to maintain the solution. We can still
>> delete this
>> if we choose another way, of course, I just wanted to make sure all below
>> is "workable" and I simply implemented a complete, working solution. It's
>> not as complex, but it's good I was doing it - I found a few things that
>> had to be fixed in Dockerfiles and build scripts provided by upstream
>> repos, I also made sure that we are using the latest patched versions of
>> all the tools. In all cases we can rebuild everything from sources -
>> we do
>> not have to rely on some binary that we trust was build from the sources
>> (other than official images)..
>>  
>> Happy to hear any comments, but I propose that if the below looks
>> good to
>> you, we get a lazy consensus and I simply implement and document it. I
>> would also make it a rule for our images that we keep that approach for
>> future images.
>>  
>> *More details:*
>>  
>> 1) I brought all the images to "apache/airlfow" DockerHub registry: both
>> dev images and the ones used in the chart. I tried to have a
>> separate "airflowdev" user but it turns out to be not really good - it's
>> either one-user account or organization with up to three people for free.
>> That would be a bit hassle with 2-factor authentication etc. to
>> manage it.
>> I think it's actually quite good to have
>> "apache/airflow:helm-unittest-2020.07.10-v0.2.0-v3.1.2". image. Docker
>> works well in this setup and I think it's rather nice to have all the
>> images in one registry.
>>  
>> 2) we have three more repos where I cloned the code for those images that
>> required "whole" repo and made them standalone - i.e. depending only on
>> official images/binaries released by organizations "owning" the code in
>> questions and the code that is officially released in the official
>> "apt" or
>> "apk" (alpine) repositories). I made some airflow specific modifications
>> there (labels, maintainer, sometimes some configuration changes, build
>> scripts). Those changes are merged as separate commits - we should be able
>> to bring upstream changes from those repos rather easily if we want. Those
>> are the repos:
>>  
>> * https://github.com/apache/airflow-pgbouncer-exporter
>> * https://github.com/apache/airflow-openldap
>> * https://github.com/apache/airflow-helm-unittest
>>  
>> 3) Those images that did not require a whole separate repository, I
>> created scripts/Dockerfile folders in those two PRs: "chart/dockerfiles
>> <https://github.com/apache/airflow/pull/9650>" directory for "helm"
>> images and "scripts/ci/dockerfiles
>> <https://github.com/apache/airflow/pull/9652>" for CI images.
>>  
>> 4) All the images are based either on "alpine" or "debian-slim" or
>> "ubuntu-slim" images and they are optimized for size.
>>  
>> 5) All the images keep similar naming conventions and have similar build
>> scripts that you can simply run to rebuild the images from scratch (bumping
>> the versions, bringing upstream changes before as needed). An example build
>> script is below. It will be very easy to upgrade those images as
>> needed and
>> release them separately or all at the same time. Example naming convention:
>>  
>> *apache/airflow:airflow-pgbouncer-2020.07.10-1.14.0*
>>  
>> Legend:
>>  
>> * *pgbouncer* image released by airflow
>> * *1.14.0* - version of pgbouncer
>> * *2020.07.10* - calver version of the image (roughly - the time when the
>> image was released/created by Airflow)
>>  
>>  
>> 6) All images have a consistent labeling scheme - including commit SHA
>> used to generate the image:
>>  
>>  
>>  
>>  
>>  
>>  
>> *            "Labels": {
>> "org.apache.airflow.airflow_pgbouncer.version": "2020.07.10",
>>   "org.apache.airflow.commit_sha":
>> "43e6406a84d2589bd54c3c37ceaa0c3ebaa9de26",
>> "org.apache.airflow.component": "pgbouncer",
>> "org.apache.airflow.pgbouncer.version": "1.14.0"            }*
>>  
>>  
>> 7) No regular maintenance is needed for CI images - we can bump them from
>> time to time on an ad-hoc basis or when we need to increase version. For
>> Helm images I think we should release new versions of those images every
>> time we release Helm chart - we can then rebuild the images using the
>> latest patches of debian/alpine and latest versions of the software
>> we have
>> in them.
>>  
>> 8) Example build script
>>  
>> #!/usr/bin/env bash
>> # Licensed to the Apache Software Foundation (ASF) under one
>> # ... licence here
>> set -euo pipefail
>> DOCKERHUB_USER=${DOCKERHUB_USER:="apache"}
>> DOCKERHUB_REPO=${DOCKERHUB_REPO:="airflow"}
>> PGBOUNCER_VERSION="1.14.0"
>> AIRFLOW_PGBOUNCER_VERSION="2020.07.10"
>> COMMIT_SHA=$(git rev-parse HEAD)
>>  
>> cd "$( dirname "${BASH_SOURCE[0]}" )" || exit 1
>>  
>>  
>> TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:airflow-pgbouncer-${AIRFLOW_PGBOUNCER_VERSION}-${PGBOUNCER_VERSION}"
>>  
>> docker build . \
>>     --pull \
>>     --build-arg "PGBOUNCER_VERSION=${PGBOUNCER_VERSION}" \
>>     --build-arg "AIRFLOW_PGBOUNCER_VERSION=${AIRFLOW_PGBOUNCER_VERSION}"\
>>     --build-arg "COMMIT_SHA=${COMMIT_SHA}" \
>>     --tag "${TAG}"
>>  
>> docker push "${TAG}"
>>  
>>  
>> J.
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>> On Thu, Jul 2, 2020 at 2:12 PM Jarek Potiuk <[email protected]>
>> wrote:
>>  
>>> And the right Greg here :(,
>>>  
>>> J.
>>>  
>>>  
>>>  
>>> On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <[email protected]>
>>> wrote:
>>>  
>>>> Hey Ash, Greg, Daniel,
>>>>  
>>>> So I understand there is no problem with licenses for those images and
>>>> we can get/use the sources for those?
>>>>  
>>>> I would love to add the scripts/Dockerfiles to the sources - to be able
>>>> to rebuild the images. I have some of those already and would like
>>>> to make
>>>> a  PR, but It would be great if we can get the Dockerfile sources.
>>>> I also
>>>> want to ask a few questions about versions of the base images (some
>>>> of the
>>>> base images seem to be quite old and there are newer releases so I wanted
>>>> to check if there is anything to prevent upgrading them).
>>>>  
>>>> J
>>>>  
>>>>  
>>>> On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk <[email protected]>
>>>> wrote:
>>>>  
>>>>>  
>>>>> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <[email protected]>
>>>>> wrote:
>>>>>  
>>>>>> > - apache/airflow:statstd-exporter-2020.6.31
>>>>>> > - apache/airflow:pgbouncer-2020.6.31
>>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31
>>>>>  
>>>>> Do we count these as "releases" (i.e. do the PMC need to vote on them)
>>>>>> or not?
>>>>>>  
>>>>>  
>>>>> I think we should. I believe we should make it a part of regular
>>>>> release and vote together on "airflow + prod image + helm + dependent
>>>>> images".
>>>>> Then we might release each of those separately if needed -  with
>>>>> separate voting/process (possibly we can bundle together several different
>>>>> things to release). Hence CalVer might make more sense even if we release
>>>>> them together with 1.10.x or 2.Y (especially that those deps are pretty
>>>>> much independent from the airflow version used). I think for
>>>>> Airflow + Prod
>>>>> image, it makes perfect sense to keep 1.10.* 2.0.* - but for Helm and
>>>>> dependent images - CalVer seems like a better idea.
>>>>>  
>>>>>  
>>>>> For these I think including the upstream version is useful too (either
>>>>>> as well, or instead) -- that way people can look at the right version
>>>>>> of
>>>>>> the upstream docs when looking at what configuration options
>>>>>> there are.
>>>>>> so `apache/airflow:pgbouncer-1.8.1-1` or
>>>>>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D )
>>>>>>  
>>>>>  
>>>>> Agree. BTW. I wondered if anyone notices the date ;).
>>>>>  
>>>>> (FYI For pgbouncer-exporter there are three such projects on github,
>>>>>> Juraj's was picked somewhat randomly)
>>>>>>  
>>>>>> > I think now it's the matter of just following up with the
>>>>>> > releases of pgbouncer and libressl and libressl-dev
>>>>>>  
>>>>>> That's still a fairly big "just". And there ssl libraries aren't the
>>>>>> only sources of security patches needed. Also the act of updating is
>>>>>> the
>>>>>> easy part -- its the notification to know when updates are
>>>>>> needed, and
>>>>>> ensuring that they happen in a timely manner that is the hard
>>>>>> part :)
>>>>>>  
>>>>>  
>>>>> True. But I think we have some precedent in our CI/Prod images. We have
>>>>> it currently automated so that they self-maintain ad self-upgrade:
>>>>> https://github.com/apache/airflow/blob/master/CI.rst. The current CI
>>>>> automation is done in the way that we are catching up fairly
>>>>> quickly with
>>>>> the latest python patches - almost without noticing (well there is
>>>>> a few
>>>>> hours period where the builds on CI get slower and people need to update
>>>>> their Breeze images). But other than that it happens automatically and
>>>>> without anyone doing any active work there.
>>>>>  
>>>>> I can do a very similar approach for all the images (both dev and
>>>>> runtime) and add a notification component to notify if any of the
>>>>> upstreaming deps changes. So it will be - from our side - mostly deciding
>>>>> if we should release it out-of-the-bands or wait for "regular" release.
>>>>>  
>>>>> J.
>>>>>  
>>>>>  
>>>>>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <[email protected]>
>>>>>> wrote:
>>>>>>  
>>>>>> > I think  I'd feel more comfortable if we have it all under
>>>>>> "community"
>>>>>> > umbrella.
>>>>>> >
>>>>>> >   - For dev images - I think we have a good idea from couchdb. I
>>>>>> will make
>>>>>> >   a POC of that and PR shortly. I already created airflowdev account
>>>>>> on
>>>>>> >   Dockerhub and make it available to PMCs of Airlfow and
>>>>>> connect it
>>>>>> to our
>>>>>> >   repo to automate Dev dependencies.
>>>>>> >   - For the runtime (astronomer) images I took a deeper look
>>>>>> and I
>>>>>> think
>>>>>> >   it makes perfect sense to add them and release by Airflow Community
>>>>>> > as well:
>>>>>> >
>>>>>> > Here is what is in those images:
>>>>>> >
>>>>>> >   - astronomerinc/ap-statsd-exporter
>>>>>> >   <
>>>>>> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore
>>>>>> >
>>>>>> >   - this image is just based on the official Prometheus Statsd
>>>>>> > exported with
>>>>>> >   added file "/etc/statsd-exporter/mappings.yml". So the maintenance
>>>>>> is
>>>>>> >   mainly about keeping the mapping and possibly upgrade to lates
>>>>>> released
>>>>>> >   prometheus-statsd occasionally. The first one sounds like a good
>>>>>> > idea for
>>>>>> >   community work, the second we can easily automate - same way
>>>>>> as we
>>>>>> > do for
>>>>>> >   production images. Seems that this one is updated once every few
>>>>>> > months, so
>>>>>> >   we can easily do that. astronomerinc/ap-pgbouncer:latest
>>>>>> >   - astronomerinc/ap-pgbouncer
>>>>>> >   <
>>>>>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore
>>>>>> >
>>>>>> >   - this is just packaging pgbouncer into an image - this one seems
>>>>>> to be
>>>>>> >   updated more frequently in the past but I think now it's the matter
>>>>>> > of just
>>>>>> >   following up with the releases of pgbouncer and libressl and
>>>>>> lbressl-dev
>>>>>> >
>>>>>> >   <
>>>>>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
>>>>>> >
>>>>>> >   - astronomerinc/ap-pgbouncer-exporter
>>>>>> >   <
>>>>>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
>>>>>> >
>>>>>> >   - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer
>>>>>> Prometheus
>>>>>> >   exporter with libressl and libressl-dev library upgraded. Also
>>>>>> usually
>>>>>> >   updated every few months. Here I think it would also make
>>>>>> sense to
>>>>>> bring
>>>>>> >   the source code in to the community for Juraj's image as well.
>>>>>> >
>>>>>> > I also think it would make sense (unlike the dev dependencies) to
>>>>>> publish
>>>>>> > all "runtime" devs under the "apache/airflow" repository. That would
>>>>>> > be a
>>>>>> > bit awkward, but I think it's the least "effort" we need to maintain
>>>>>> and
>>>>>> > make sure it is officially "blessed" during the release.
>>>>>> >
>>>>>> > So the proposal I have (if we use calver versioning similar to
>>>>>> backport
>>>>>> > packages):
>>>>>> >
>>>>>> >   - apache/airflow:statstd-exporter-2020.6.31
>>>>>> >   - apache/airflow:pgbouncer-2020.6.31
>>>>>> >   - apache/airflow:pgbouncer-exporter-2020.6.31
>>>>>> >
>>>>>> > I am happy to bring it all to our repo and setup automation.
>>>>>> >
>>>>>> > J.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <[email protected]>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Wow Kamil that's an awesome and mature processs for a company to
>>>>>> take --
>>>>>> >> I wish more companies treated open source deps that way.
>>>>>> >>
>>>>>> >> As I mentioned in the original Helm PR (but just in a comment left
>>>>>> to a
>>>>>> >> review), I left a few of the "support" Docker images as
>>>>>> astronomerinc
>>>>>> >> ones as the upstream Docker images are "unmaintained" (that isn't
>>>>>> to say
>>>>>> >> the projects are, just that the images aren't re-published in a
>>>>>> timely
>>>>>> >> fashion to update openssl etc.)
>>>>>> >>
>>>>>> >> I am happy to replace the astronomerinc support images with others
>>>>>> if we
>>>>>> >> want to. I am also happy to clarify/make explicit the license
>>>>>> situation
>>>>>> >> that those images are distributed under (Apache 2) if we want to
>>>>>> stick
>>>>>> >> with them and let us (Astronomer) carry the burden of patching and
>>>>>> >> updating them -- it is after all part of what people pay us
>>>>>> for so
>>>>>> we'll
>>>>>> >> be doing it anyway.
>>>>>> >>
>>>>>> >> > Besides, we should provide the possibility to replace "Object
>>>>>> code" with
>>>>>> >> > other objects i.e., use of an image from a private third-party
>>>>>> registry.
>>>>>> >>
>>>>>> >> The images to use come from the helm values, so are easily
>>>>>> changable at
>>>>>> >> helm install/upgrade time:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
>>>>>> >>
>>>>>> >> -ash
>>>>>> >>
>>>>>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła <
>>>>>> [email protected]>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >> > These files have no information to determine the license.  
>>>>>> In my
>>>>>> opinion,
>>>>>> >> > these images ("Derivative Works") should be treated as
>>>>>> Astronomer's or
>>>>>> >> > other users' copyrighted files. Please note that Astronomer may
>>>>>> >> distribute
>>>>>> >> > the images under a different license, but they need to
>>>>>> acknowledge the
>>>>>> >> use
>>>>>> >> > of the Foundation or other licensed software. To do otherwise
>>>>>> would be
>>>>>> >> > stealing.
>>>>>> >> >
>>>>>> >> > DockerHub is not an Open Source software registry, and we cannot
>>>>>> assume
>>>>>> >> > that every image there is available under a license that allows
>>>>>> >> free use.
>>>>>> >> >
>>>>>> >> > **What does this mean for the project?**
>>>>>> >> >
>>>>>> >> > This is incompatible with the Apache license because each runtime
>>>>>> >> > dependencies must also be based on the Apache-compatible license.
>>>>>> These
>>>>>> >> > images are required to run the Helm Chart, so are its dependencies
>>>>>> >> > Dependencies that are not compatible with the Apache license
>>>>>> are a
>>>>>> >> problem
>>>>>> >> > for our users and prevent the use of this project.
>>>>>> >> >
>>>>>> >> > **How do we deal with this topic in my organization?**
>>>>>> >> >
>>>>>> >> > We take the topic of copyright very seriously in my organization.
>>>>>> >> One of
>>>>>> >> > the steps we take before publishing a derivative work based
>>>>>> on an
>>>>>> >> > Open-Source license is to audit the source code to see if each
>>>>>> part is
>>>>>> >> > under a license that allows us to use it. If we build images or
>>>>>> artifacts
>>>>>> >> > automatically, we take steps that prevent the accidental
>>>>>> publication
>>>>>> >> > of an
>>>>>> >> > artifact that could contain works that have an incorrect license.
>>>>>> >> >
>>>>>> >> > We do this by building the audited internal registry:
>>>>>> >> > - In the case of Airflow, this is a copy of the source code and
>>>>>> the
>>>>>> >> > necessary PIP libraries stored in the blockchain-based registry
>>>>>> >> > (append-only registry). Any change in such a registry
>>>>>> undergoes a
>>>>>> review
>>>>>> >> > process and must be approved. It is not possible to revert an
>>>>>> approved
>>>>>> >> > change without leaving a trace.
>>>>>> >> > - In the case of Docker images, this means that each image is
>>>>>> built
>>>>>> >> > automatically, and no one publishes the images to images register
>>>>>> >> manually
>>>>>> >> > (docker push). No step can download files from a registry
>>>>>> that is
>>>>>> not
>>>>>> >> > auditable.
>>>>>> >> >
>>>>>> >> > Such steps allow you to recreate the software development process,
>>>>>> >> > e.g. in
>>>>>> >> > the case of a court case.
>>>>>> >> >
>>>>>> >> > In our case, it won't be easy to introduce all similar
>>>>>> requirements,
>>>>>> >> > but we
>>>>>> >> > can try to be compatible with them so that organizations that
>>>>>> have the
>>>>>> >> same
>>>>>> >> > requirements can meet them.
>>>>>> >> >
>>>>>> >> > **What should we do?**
>>>>>> >> >
>>>>>> >> > In my opinion, this is similar to using libraries in our
>>>>>> application.
>>>>>> >> > We do
>>>>>> >> > not perform a publisher assessment for every library we use. We
>>>>>> only
>>>>>> >> verify
>>>>>> >> > license compliance.
>>>>>> >> >
>>>>>> >> > On the other hand, it looks different because it is "Object
>>>>>> Code", not
>>>>>> >> > "Source Code". We do not use source code directly, but we
>>>>>> use an
>>>>>> object
>>>>>> >> > prepared by a third party - "Derivative Works".
>>>>>> >> >
>>>>>> >> > In my opinion, relying on any Docker image ("Object Code")
>>>>>> is OK
>>>>>> if they
>>>>>> >> > meet the following requirements:
>>>>>> >> > - The Source Code required to create the object should be publicly
>>>>>> >> > available and should be compatible with the Apache license.
>>>>>> >> > - We should have s access to Compilation Information. The
>>>>>> Compilation
>>>>>> >> > Information must suffice to ensure that the continued functioning
>>>>>> >> of the
>>>>>> >> > source code is in no case prevented or interfered with solely
>>>>>> because
>>>>>> >> > modification has been made.
>>>>>> >> >
>>>>>> >> > Besides, we should provide the possibility to replace "Object
>>>>>> code" with
>>>>>> >> > other objects i.e., use of an image from a private third-party
>>>>>> registry.
>>>>>> >> >
>>>>>> >> > Thank Jarek for paying attention to this issue.  I didn't think
>>>>>> >> about it
>>>>>> >> > before, but now I know I couldn't use the Helm Chart in its
>>>>>> current
>>>>>> >> > form in
>>>>>> >> > any of my work. I am afraid that many members of our community
>>>>>> >> would face
>>>>>> >> > similar problems if they tried to use it in a production
>>>>>> environment.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <[email protected]
>>>>>> >
>>>>>> >> wrote:
>>>>>> >> >
>>>>>> >> >> Licensing wise there is no issue from me: The astronomerinc
>>>>>> images are
>>>>>> >> >> just re-packaging of the upstream images to apply security fixes
>>>>>> >> so are
>>>>>> >> >> licensed under whatever the original image is (MIT or Apache2
>>>>>> usually,
>>>>>> >> >> else we wouldn't have put them in the helm chart PR)
>>>>>> >> >>
>>>>>> >> >> For background, the reason that we at Astronomer created
>>>>>> >> >> ap-pgbouncer-exporter in the first place is that the upstream
>>>>>> package
>>>>>> >> >> does not patch/rebuild to address security vulnerabilities. By
>>>>>> taking
>>>>>> >> >> this in to airflow-ext it means we as a project become
>>>>>> responsible for
>>>>>> >> >> monitoring and testing that. (And don't be fooled in to thinking
>>>>>> the
>>>>>> >> >> free scanners can detect all vulns here, we've found them
>>>>>> to be
>>>>>> >> very of
>>>>>> >> >> variable, and questionable accuracy.)
>>>>>> >> >>
>>>>>> >> >> That is a non-trivial amount of work for an open source project.
>>>>>> >> >>
>>>>>> >> >> Has this ever caused us any problems outside of Pip/python
>>>>>> dependencies?
>>>>>> >> >> (I'm not aware of any.) For runtime this maybe makes sense
>>>>>> (again, I'm
>>>>>> >> >> not yet convinced), but for test-only/dev-only deps this seems
>>>>>> >> like a
>>>>>> >> >> lot of work that we could better spend on working on
>>>>>> Airflow. If
>>>>>> >> we pin
>>>>>> >> >> versions of docker image used then the only real risk is a
>>>>>> left-pad
>>>>>> >> >> scenario of "I'm deleting all my images" which is a minor risk.
>>>>>> >> >>
>>>>>> >> >> Do any other project do anything like this? I haven't seen it
>>>>>> before.
>>>>>> >> >>
>>>>>> >> >> I'd vote for doing nothing and addressing this in specific cases
>>>>>> >> when it
>>>>>> >> >> becomes a problem. Because I do not see using thidy party docker
>>>>>> images
>>>>>> >> >> as a risk. I see it as a time saving measure.
>>>>>> >> >>
>>>>>> >> >> -ash
>>>>>> >> >>
>>>>>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <
>>>>>> [email protected]>
>>>>>> >> wrote:
>>>>>> >> >>
>>>>>> >> >> > Hello everyone,
>>>>>> >> >> >
>>>>>> >> >> > TL;DR; I noticed that we are accumulating some
>>>>>> dependencies to
>>>>>> >> external
>>>>>> >> >> > binaries (downloads and Docker images) which make the Apache
>>>>>> Airflow
>>>>>> >> >> > Community a bit vulnerable to external dependencies.  I would
>>>>>> love
>>>>>> >> your
>>>>>> >> >> > comments/opinions on the proposal I made around this.
>>>>>> >> >> >
>>>>>> >> >> > *More explanation/status:*
>>>>>> >> >> >
>>>>>> >> >> > While dependence is fine for officially "released" and
>>>>>> "managed" by
>>>>>> >> the
>>>>>> >> >> > owning organizations, I think it is a bit risky to depend on
>>>>>> those
>>>>>> >> long
>>>>>> >> >> > term and I think we should aim to bring all those "vulnerable"
>>>>>> >> >> dependencies
>>>>>> >> >> > into community control.
>>>>>> >> >> >
>>>>>> >> >> > I reviewed all our code (or I think all !) looking for such
>>>>>> >> dependencies
>>>>>> >> >> > and prepared an "umbrella" issue where I proposed the approach
>>>>>> >> we can
>>>>>> >> >> take
>>>>>> >> >> > for all such dependencies.
>>>>>> >> >> >
>>>>>> >> >> > I could have missed some - so if you find others feel
>>>>>> free to
>>>>>> >> comment/add
>>>>>> >> >> > the new ones.
>>>>>> >> >> > All the details are captured here:
>>>>>> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed
>>>>>> the
>>>>>> >> >> > context/motivation/current status and approach we can
>>>>>> take for
>>>>>> those
>>>>>> >> >> > dependencies.
>>>>>> >> >> >
>>>>>> >> >> > A lot of those dependencies just need review and maybe some
>>>>>> >> updates to
>>>>>> >> >> > latest versions. And I do not think there is a lot to discuss
>>>>>> for
>>>>>> >> those.
>>>>>> >> >> >
>>>>>> >> >> > There is one point, however, that requires more deliberate
>>>>>> >> action and
>>>>>> >> >> some
>>>>>> >> >> > decisions I think.
>>>>>> >> >> >
>>>>>> >> >> > We have some dependencies on Docker images that we are using
>>>>>> from
>>>>>> >> various
>>>>>> >> >> > sources:
>>>>>> >> >> > 1) officially maintained images
>>>>>> >> >> > 2) images released by organizations that released them for
>>>>>> their own
>>>>>> >> >> > purpose, but they are not "officially maintained" by those
>>>>>> >> organizations
>>>>>> >> >> > 3) images released by private individuals
>>>>>> >> >> >
>>>>>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should
>>>>>> bring the
>>>>>> >> >> images
>>>>>> >> >> > to Airflow community management. Here is the list of those
>>>>>> >> images I
>>>>>> >> found
>>>>>> >> >> > that need to be moved to Airflow:
>>>>>> >> >> >
>>>>>> >> >> >   - aneeshkj/helm-unittest
>>>>>> >> >> >   - ashb/apache-rat:0.13-1
>>>>>> >> >> >   - godatadriven/krb5-kdc-server
>>>>>> >> >> >   - polinux/stress (?)
>>>>>> >> >> >   - osixia/openldap:1.2.0
>>>>>> >> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
>>>>>> >> >> >   - astronomerinc/ap-pgbouncer:1.8.1
>>>>>> >> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> > *Proposal*:
>>>>>> >> >> >
>>>>>> >> >> > My proposal is to make a folder in our repository on Github
>>>>>> (continue
>>>>>> >> >> with
>>>>>> >> >> > the mono-repo approach we follow) to keep corresponding
>>>>>> Dockerfiles
>>>>>> >> and
>>>>>> >> >> > scripts that build and release images from there. Now the only
>>>>>> >> >> > question is
>>>>>> >> >> > where to keep those images. We currently have apache/airflow
>>>>>> but I
>>>>>> >> >> > think we
>>>>>> >> >> > should reserve it for airflow images only and we should keep
>>>>>> those
>>>>>> >> images
>>>>>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any
>>>>>> >> sort in
>>>>>> >> >> > DockerHub. We are already abusing a bit the "apache/airflow"
>>>>>> >> >> namespace as
>>>>>> >> >> > we are keeping both CI and production images there (but that's
>>>>>> quite
>>>>>> >> >> > OK as
>>>>>> >> >> > the images are similar).
>>>>>> >> >> >
>>>>>> >> >> > My proposal will be to create an* "apache/airflow-ext"*
>>>>>> DockerHub
>>>>>> >> >> > repository and keep the images there. They will also be a
>>>>>> little
>>>>>> >> >> > abused because we will have to name them with tags - for
>>>>>> example:
>>>>>> >> >> >
>>>>>> >> >> >   - apache/airflow-ext:helm-unittest-[version]
>>>>>> >> >> >   - apache/airflow-ext:apache-rat-[version]
>>>>>> >> >> >
>>>>>> >> >> > I am also open to other names for the repo and proposals other
>>>>>> ways
>>>>>> >> >> > how to
>>>>>> >> >> > handle that.
>>>>>> >> >> >
>>>>>> >> >> > I believe there is no issue with Licences for either of those
>>>>>> images
>>>>>> >> >> (Ash,
>>>>>> >> >> > Kaxil, Fokko - some of the images are
>>>>>> Astronomer's/GoDataDriven's
>>>>>> >> >> ones -
>>>>>> >> >> > can you comment on that ?)  but I believe licensing on all
>>>>>> those
>>>>>> >> >> > images are
>>>>>> >> >> > ok for us to copy with attribution (I will double-check that
>>>>>> for other
>>>>>> >> >> > images).
>>>>>> >> >> >
>>>>>> >> >> > WDYT?
>>>>>> >> >> >
>>>>>> >> >> > J.
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> > --
>>>>>> >> >> >
>>>>>> >> >> > Jarek Potiuk
>>>>>> >> >> > Polidea <https://www.polidea.com/> | Principal Software
>>>>>> Engineer
>>>>>> >> >> >
>>>>>> >> >> > M: +48 660 796 129 <+48660796129>
>>>>>> >> >> > [image: Polidea] <https://www.polidea.com/>
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > Jarek Potiuk
>>>>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>> >
>>>>>> > M: +48 660 796 129 <+48660796129>
>>>>>> > [image: Polidea] <https://www.polidea.com/>
>>>>>> >
>>>>>>  
>>>>>  
>>>>>  
>>>>> --
>>>>>  
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>  
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>  
>>>>>  
>>>>  
>>>> --
>>>>  
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>  
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>  
>>>>  
>>>  
>>> --
>>>  
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>  
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>  
>>>  
>>  
>> --
>>  
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>  
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>  
>>  
>  
> --  
>  
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>  
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to