Yeah I figured that from looking at the commits -- but I think even if it was an proper fork I wouldn't be a fan of this approach: we'd have too keep "porting"/merging our changes to update from upstream.
-ash On Jul 6 2020, at 1:36 pm, Jarek Potiuk <[email protected]> wrote: > Sure - we could do that as well if we agree on that. > > Just to explain - the repository is really a "fork" of the original one > with our modifications on top. The only reason it's not an "actual" github > fork was that I cannot do a fork in "apache" organisation. > > J. > > > On Mon, Jul 6, 2020 at 2:22 PM Ash Berlin-Taylor <[email protected]> wrote: > >> I've just taken a look at the >> https://github.com/apache/airflow-pgbouncer-exporter (I'm assuming the >> others are the same) and "woah, wait" was my reaction. >> >> Having a repo where we include the Dockerfile and build scripts: I'm >> okay with that. >> >> This approach where we have an entire copy of the code and have >> essentially forked the the upstream project: not happy verging on a >> -1/veto of this approach. >> >> I.e. I'd prefer this repo was just a Dockerfile that pulls the upstream >> project from a published release/git tag/pinned commit sha. >> >> -ash >> >> On Jul 6 2020, at 12:46 pm, Jarek Potiuk <[email protected]> wrote: >> >> > One more comment. I started the discussion in the build devlist of >> Apache: >> > >> https://lists.apache.org/thread.html/rf2af2a95e7687fe94ede23fe9df388f784c8231a5968b109f677cbe8%40%3Cbuilds.apache.org%3E >> > - and so far there are no conclusive answers. Iy is something that >> is not >> > regulated clearly by ASF rules it seems, >> > >> > So seems to me we are free to choose what our approach is (for now): >> > >> > But I have found this at least: >> > >> > https://www.apache.org/legal/release-policy.html#what >> > >> > "The Apache Software Foundation produces open source software. All >> releases >> > are in the form of the source materials needed to make changes to the >> > software being released. In some cases, binary/bytecode packages >> are also >> > produced as a convenience to users that might not have the appropriate >> > tools to build a compiled version of the source. In all such cases, the >> > binary/bytecode package must have the same version number as the source >> > release and may only add binary/bytecode files that are the result of >> > compiling that version of the source code release." >> > >> > I think "the spirit" of that chapter is something that I am referring >> > to - >> > from the beginning of the thread. >> > >> > I really think if we give our users a convenient way of using some binary >> > packages (i.e. docker images) there should be an easy way to reproduce >> > those from sources. I have the feeling that my proposal is simply an >> > embodiment of that rule. Glad to hear what other think about it. I am >> fully >> > aware it is a "gray" area, but I think with a very little cost we can >> move >> > it to the "white" area. >> > >> > J. >> > >> > >> > >> > On Sun, Jul 5, 2020 at 11:42 AM Jarek Potiuk <[email protected]> >> > wrote: >> > >> >> Hello Everyone, >> >> >> >> TL;DR: I did some experiments with those images and I have a >> proposal on >> >> how we can handle that. I have a workable proposal. >> >> >> >> I already created a few repos to see how it can work and I think I >> >> have a >> >> workable and rather easy to maintain the solution. We can still >> >> delete this >> >> if we choose another way, of course, I just wanted to make sure all >> below >> >> is "workable" and I simply implemented a complete, working solution. >> It's >> >> not as complex, but it's good I was doing it - I found a few >> things that >> >> had to be fixed in Dockerfiles and build scripts provided by upstream >> >> repos, I also made sure that we are using the latest patched >> versions of >> >> all the tools. In all cases we can rebuild everything from sources - >> >> we do >> >> not have to rely on some binary that we trust was build from the sources >> >> (other than official images).. >> >> >> >> Happy to hear any comments, but I propose that if the below looks >> >> good to >> >> you, we get a lazy consensus and I simply implement and document >> it. I >> >> would also make it a rule for our images that we keep that >> approach for >> >> future images. >> >> >> >> *More details:* >> >> >> >> 1) I brought all the images to "apache/airlfow" DockerHub >> registry: both >> >> dev images and the ones used in the chart. I tried to have a >> >> separate "airflowdev" user but it turns out to be not really good >> - it's >> >> either one-user account or organization with up to three people for >> free. >> >> That would be a bit hassle with 2-factor authentication etc. to >> >> manage it. >> >> I think it's actually quite good to have >> >> "apache/airflow:helm-unittest-2020.07.10-v0.2.0-v3.1.2". image. Docker >> >> works well in this setup and I think it's rather nice to have all the >> >> images in one registry. >> >> >> >> 2) we have three more repos where I cloned the code for those images >> that >> >> required "whole" repo and made them standalone - i.e. depending >> only on >> >> official images/binaries released by organizations "owning" the >> code in >> >> questions and the code that is officially released in the official >> >> "apt" or >> >> "apk" (alpine) repositories). I made some airflow specific modifications >> >> there (labels, maintainer, sometimes some configuration changes, build >> >> scripts). Those changes are merged as separate commits - we should be >> able >> >> to bring upstream changes from those repos rather easily if we want. >> Those >> >> are the repos: >> >> >> >> * https://github.com/apache/airflow-pgbouncer-exporter >> >> * https://github.com/apache/airflow-openldap >> >> * https://github.com/apache/airflow-helm-unittest >> >> >> >> 3) Those images that did not require a whole separate repository, I >> >> created scripts/Dockerfile folders in those two PRs: "chart/dockerfiles >> >> <https://github.com/apache/airflow/pull/9650>" directory for "helm" >> >> images and "scripts/ci/dockerfiles >> >> <https://github.com/apache/airflow/pull/9652>" for CI images. >> >> >> >> 4) All the images are based either on "alpine" or "debian-slim" or >> >> "ubuntu-slim" images and they are optimized for size. >> >> >> >> 5) All the images keep similar naming conventions and have similar build >> >> scripts that you can simply run to rebuild the images from scratch >> (bumping >> >> the versions, bringing upstream changes before as needed). An example >> build >> >> script is below. It will be very easy to upgrade those images as >> >> needed and >> >> release them separately or all at the same time. Example naming >> convention: >> >> >> >> *apache/airflow:airflow-pgbouncer-2020.07.10-1.14.0* >> >> >> >> Legend: >> >> >> >> * *pgbouncer* image released by airflow >> >> * *1.14.0* - version of pgbouncer >> >> * *2020.07.10* - calver version of the image (roughly - the time when >> the >> >> image was released/created by Airflow) >> >> >> >> >> >> 6) All images have a consistent labeling scheme - including commit SHA >> >> used to generate the image: >> >> >> >> >> >> >> >> >> >> >> >> >> >> * "Labels": { >> >> "org.apache.airflow.airflow_pgbouncer.version": "2020.07.10", >> >> "org.apache.airflow.commit_sha": >> >> "43e6406a84d2589bd54c3c37ceaa0c3ebaa9de26", >> >> "org.apache.airflow.component": "pgbouncer", >> >> "org.apache.airflow.pgbouncer.version": "1.14.0" }* >> >> >> >> >> >> 7) No regular maintenance is needed for CI images - we can bump them >> from >> >> time to time on an ad-hoc basis or when we need to increase >> version. For >> >> Helm images I think we should release new versions of those images every >> >> time we release Helm chart - we can then rebuild the images using the >> >> latest patches of debian/alpine and latest versions of the software >> >> we have >> >> in them. >> >> >> >> 8) Example build script >> >> >> >> #!/usr/bin/env bash >> >> # Licensed to the Apache Software Foundation (ASF) under one >> >> # ... licence here >> >> set -euo pipefail >> >> DOCKERHUB_USER=${DOCKERHUB_USER:="apache"} >> >> DOCKERHUB_REPO=${DOCKERHUB_REPO:="airflow"} >> >> PGBOUNCER_VERSION="1.14.0" >> >> AIRFLOW_PGBOUNCER_VERSION="2020.07.10" >> >> COMMIT_SHA=$(git rev-parse HEAD) >> >> >> >> cd "$( dirname "${BASH_SOURCE[0]}" )" || exit 1 >> >> >> >> >> >> >> TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:airflow-pgbouncer-${AIRFLOW_PGBOUNCER_VERSION}-${PGBOUNCER_VERSION}" >> >> >> >> docker build . \ >> >> --pull \ >> >> --build-arg "PGBOUNCER_VERSION=${PGBOUNCER_VERSION}" \ >> >> --build-arg >> "AIRFLOW_PGBOUNCER_VERSION=${AIRFLOW_PGBOUNCER_VERSION}"\ >> >> --build-arg "COMMIT_SHA=${COMMIT_SHA}" \ >> >> --tag "${TAG}" >> >> >> >> docker push "${TAG}" >> >> >> >> >> >> J. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Jul 2, 2020 at 2:12 PM Jarek Potiuk <[email protected]> >> >> wrote: >> >> >> >>> And the right Greg here :(, >> >>> >> >>> J. >> >>> >> >>> >> >>> >> >>> On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <[email protected] >> > >> >>> wrote: >> >>> >> >>>> Hey Ash, Greg, Daniel, >> >>>> >> >>>> So I understand there is no problem with licenses for those >> images and >> >>>> we can get/use the sources for those? >> >>>> >> >>>> I would love to add the scripts/Dockerfiles to the sources - to be >> able >> >>>> to rebuild the images. I have some of those already and would like >> >>>> to make >> >>>> a PR, but It would be great if we can get the Dockerfile sources. >> >>>> I also >> >>>> want to ask a few questions about versions of the base images (some >> >>>> of the >> >>>> base images seem to be quite old and there are newer releases so I >> wanted >> >>>> to check if there is anything to prevent upgrading them). >> >>>> >> >>>> J >> >>>> >> >>>> >> >>>> On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk < >> [email protected]> >> >>>> wrote: >> >>>> >> >>>>> >> >>>>> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <[email protected]> >> >>>>> wrote: >> >>>>> >> >>>>>> > - apache/airflow:statstd-exporter-2020.6.31 >> >>>>>> > - apache/airflow:pgbouncer-2020.6.31 >> >>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31 >> >>>>> >> >>>>> Do we count these as "releases" (i.e. do the PMC need to vote on >> them) >> >>>>>> or not? >> >>>>>> >> >>>>> >> >>>>> I think we should. I believe we should make it a part of regular >> >>>>> release and vote together on "airflow + prod image + helm + dependent >> >>>>> images". >> >>>>> Then we might release each of those separately if needed - with >> >>>>> separate voting/process (possibly we can bundle together several >> different >> >>>>> things to release). Hence CalVer might make more sense even if we >> release >> >>>>> them together with 1.10.x or 2.Y (especially that those deps are >> pretty >> >>>>> much independent from the airflow version used). I think for >> >>>>> Airflow + Prod >> >>>>> image, it makes perfect sense to keep 1.10.* 2.0.* - but for >> Helm and >> >>>>> dependent images - CalVer seems like a better idea. >> >>>>> >> >>>>> >> >>>>> For these I think including the upstream version is useful too >> (either >> >>>>>> as well, or instead) -- that way people can look at the right >> version >> >>>>>> of >> >>>>>> the upstream docs when looking at what configuration options >> >>>>>> there are. >> >>>>>> so `apache/airflow:pgbouncer-1.8.1-1` or >> >>>>>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D ) >> >>>>>> >> >>>>> >> >>>>> Agree. BTW. I wondered if anyone notices the date ;). >> >>>>> >> >>>>> (FYI For pgbouncer-exporter there are three such projects on github, >> >>>>>> Juraj's was picked somewhat randomly) >> >>>>>> >> >>>>>> > I think now it's the matter of just following up with the >> >>>>>> > releases of pgbouncer and libressl and libressl-dev >> >>>>>> >> >>>>>> That's still a fairly big "just". And there ssl libraries >> aren't the >> >>>>>> only sources of security patches needed. Also the act of >> updating is >> >>>>>> the >> >>>>>> easy part -- its the notification to know when updates are >> >>>>>> needed, and >> >>>>>> ensuring that they happen in a timely manner that is the hard >> >>>>>> part :) >> >>>>>> >> >>>>> >> >>>>> True. But I think we have some precedent in our CI/Prod images. We >> have >> >>>>> it currently automated so that they self-maintain ad self-upgrade: >> >>>>> https://github.com/apache/airflow/blob/master/CI.rst. The >> current CI >> >>>>> automation is done in the way that we are catching up fairly >> >>>>> quickly with >> >>>>> the latest python patches - almost without noticing (well there is >> >>>>> a few >> >>>>> hours period where the builds on CI get slower and people need to >> update >> >>>>> their Breeze images). But other than that it happens automatically >> and >> >>>>> without anyone doing any active work there. >> >>>>> >> >>>>> I can do a very similar approach for all the images (both dev and >> >>>>> runtime) and add a notification component to notify if any of the >> >>>>> upstreaming deps changes. So it will be - from our side - mostly >> deciding >> >>>>> if we should release it out-of-the-bands or wait for "regular" >> release. >> >>>>> >> >>>>> J. >> >>>>> >> >>>>> >> >>>>>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <[email protected] >> > >> >>>>>> wrote: >> >>>>>> >> >>>>>> > I think I'd feel more comfortable if we have it all under >> >>>>>> "community" >> >>>>>> > umbrella. >> >>>>>> > >> >>>>>> > - For dev images - I think we have a good idea from >> couchdb. I >> >>>>>> will make >> >>>>>> > a POC of that and PR shortly. I already created airflowdev >> account >> >>>>>> on >> >>>>>> > Dockerhub and make it available to PMCs of Airlfow and >> >>>>>> connect it >> >>>>>> to our >> >>>>>> > repo to automate Dev dependencies. >> >>>>>> > - For the runtime (astronomer) images I took a deeper look >> >>>>>> and I >> >>>>>> think >> >>>>>> > it makes perfect sense to add them and release by Airflow >> Community >> >>>>>> > as well: >> >>>>>> > >> >>>>>> > Here is what is in those images: >> >>>>>> > >> >>>>>> > - astronomerinc/ap-statsd-exporter >> >>>>>> > < >> >>>>>> >> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore >> >>>>>> > >> >>>>>> > - this image is just based on the official Prometheus Statsd >> >>>>>> > exported with >> >>>>>> > added file "/etc/statsd-exporter/mappings.yml". So the >> maintenance >> >>>>>> is >> >>>>>> > mainly about keeping the mapping and possibly upgrade to lates >> >>>>>> released >> >>>>>> > prometheus-statsd occasionally. The first one sounds like >> a good >> >>>>>> > idea for >> >>>>>> > community work, the second we can easily automate - same way >> >>>>>> as we >> >>>>>> > do for >> >>>>>> > production images. Seems that this one is updated once >> every few >> >>>>>> > months, so >> >>>>>> > we can easily do that. astronomerinc/ap-pgbouncer:latest >> >>>>>> > - astronomerinc/ap-pgbouncer >> >>>>>> > < >> >>>>>> >> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore >> >>>>>> > >> >>>>>> > - this is just packaging pgbouncer into an image - this one >> seems >> >>>>>> to be >> >>>>>> > updated more frequently in the past but I think now it's the >> matter >> >>>>>> > of just >> >>>>>> > following up with the releases of pgbouncer and libressl and >> >>>>>> lbressl-dev >> >>>>>> > >> >>>>>> > < >> >>>>>> >> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore >> >>>>>> > >> >>>>>> > - astronomerinc/ap-pgbouncer-exporter >> >>>>>> > < >> >>>>>> >> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore >> >>>>>> > >> >>>>>> > - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer >> >>>>>> Prometheus >> >>>>>> > exporter with libressl and libressl-dev library upgraded. Also >> >>>>>> usually >> >>>>>> > updated every few months. Here I think it would also make >> >>>>>> sense to >> >>>>>> bring >> >>>>>> > the source code in to the community for Juraj's image as well. >> >>>>>> > >> >>>>>> > I also think it would make sense (unlike the dev >> dependencies) to >> >>>>>> publish >> >>>>>> > all "runtime" devs under the "apache/airflow" repository. That >> would >> >>>>>> > be a >> >>>>>> > bit awkward, but I think it's the least "effort" we need to >> maintain >> >>>>>> and >> >>>>>> > make sure it is officially "blessed" during the release. >> >>>>>> > >> >>>>>> > So the proposal I have (if we use calver versioning similar to >> >>>>>> backport >> >>>>>> > packages): >> >>>>>> > >> >>>>>> > - apache/airflow:statstd-exporter-2020.6.31 >> >>>>>> > - apache/airflow:pgbouncer-2020.6.31 >> >>>>>> > - apache/airflow:pgbouncer-exporter-2020.6.31 >> >>>>>> > >> >>>>>> > I am happy to bring it all to our repo and setup automation. >> >>>>>> > >> >>>>>> > J. >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor < >> [email protected]> >> >>>>>> wrote: >> >>>>>> > >> >>>>>> >> Wow Kamil that's an awesome and mature processs for a >> company to >> >>>>>> take -- >> >>>>>> >> I wish more companies treated open source deps that way. >> >>>>>> >> >> >>>>>> >> As I mentioned in the original Helm PR (but just in a comment >> left >> >>>>>> to a >> >>>>>> >> review), I left a few of the "support" Docker images as >> >>>>>> astronomerinc >> >>>>>> >> ones as the upstream Docker images are "unmaintained" (that isn't >> >>>>>> to say >> >>>>>> >> the projects are, just that the images aren't re-published >> in a >> >>>>>> timely >> >>>>>> >> fashion to update openssl etc.) >> >>>>>> >> >> >>>>>> >> I am happy to replace the astronomerinc support images with >> others >> >>>>>> if we >> >>>>>> >> want to. I am also happy to clarify/make explicit the license >> >>>>>> situation >> >>>>>> >> that those images are distributed under (Apache 2) if we >> want to >> >>>>>> stick >> >>>>>> >> with them and let us (Astronomer) carry the burden of patching >> and >> >>>>>> >> updating them -- it is after all part of what people pay us >> >>>>>> for so >> >>>>>> we'll >> >>>>>> >> be doing it anyway. >> >>>>>> >> >> >>>>>> >> > Besides, we should provide the possibility to replace "Object >> >>>>>> code" with >> >>>>>> >> > other objects i.e., use of an image from a private third-party >> >>>>>> registry. >> >>>>>> >> >> >>>>>> >> The images to use come from the helm values, so are easily >> >>>>>> changable at >> >>>>>> >> helm install/upgrade time: >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92 >> >>>>>> >> >> >>>>>> >> -ash >> >>>>>> >> >> >>>>>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła < >> >>>>>> [email protected]> >> >>>>>> >> wrote: >> >>>>>> >> >> >>>>>> >> > These files have no information to determine the license. >> >>>>>> In my >> >>>>>> opinion, >> >>>>>> >> > these images ("Derivative Works") should be treated as >> >>>>>> Astronomer's or >> >>>>>> >> > other users' copyrighted files. Please note that >> Astronomer may >> >>>>>> >> distribute >> >>>>>> >> > the images under a different license, but they need to >> >>>>>> acknowledge the >> >>>>>> >> use >> >>>>>> >> > of the Foundation or other licensed software. To do otherwise >> >>>>>> would be >> >>>>>> >> > stealing. >> >>>>>> >> > >> >>>>>> >> > DockerHub is not an Open Source software registry, and we >> cannot >> >>>>>> assume >> >>>>>> >> > that every image there is available under a license that allows >> >>>>>> >> free use. >> >>>>>> >> > >> >>>>>> >> > **What does this mean for the project?** >> >>>>>> >> > >> >>>>>> >> > This is incompatible with the Apache license because each >> runtime >> >>>>>> >> > dependencies must also be based on the Apache-compatible >> license. >> >>>>>> These >> >>>>>> >> > images are required to run the Helm Chart, so are its >> dependencies >> >>>>>> >> > Dependencies that are not compatible with the Apache license >> >>>>>> are a >> >>>>>> >> problem >> >>>>>> >> > for our users and prevent the use of this project. >> >>>>>> >> > >> >>>>>> >> > **How do we deal with this topic in my organization?** >> >>>>>> >> > >> >>>>>> >> > We take the topic of copyright very seriously in my >> organization. >> >>>>>> >> One of >> >>>>>> >> > the steps we take before publishing a derivative work based >> >>>>>> on an >> >>>>>> >> > Open-Source license is to audit the source code to see if each >> >>>>>> part is >> >>>>>> >> > under a license that allows us to use it. If we build >> images or >> >>>>>> artifacts >> >>>>>> >> > automatically, we take steps that prevent the accidental >> >>>>>> publication >> >>>>>> >> > of an >> >>>>>> >> > artifact that could contain works that have an incorrect >> license. >> >>>>>> >> > >> >>>>>> >> > We do this by building the audited internal registry: >> >>>>>> >> > - In the case of Airflow, this is a copy of the source >> code and >> >>>>>> the >> >>>>>> >> > necessary PIP libraries stored in the blockchain-based registry >> >>>>>> >> > (append-only registry). Any change in such a registry >> >>>>>> undergoes a >> >>>>>> review >> >>>>>> >> > process and must be approved. It is not possible to >> revert an >> >>>>>> approved >> >>>>>> >> > change without leaving a trace. >> >>>>>> >> > - In the case of Docker images, this means that each >> image is >> >>>>>> built >> >>>>>> >> > automatically, and no one publishes the images to images >> register >> >>>>>> >> manually >> >>>>>> >> > (docker push). No step can download files from a registry >> >>>>>> that is >> >>>>>> not >> >>>>>> >> > auditable. >> >>>>>> >> > >> >>>>>> >> > Such steps allow you to recreate the software development >> process, >> >>>>>> >> > e.g. in >> >>>>>> >> > the case of a court case. >> >>>>>> >> > >> >>>>>> >> > In our case, it won't be easy to introduce all similar >> >>>>>> requirements, >> >>>>>> >> > but we >> >>>>>> >> > can try to be compatible with them so that organizations that >> >>>>>> have the >> >>>>>> >> same >> >>>>>> >> > requirements can meet them. >> >>>>>> >> > >> >>>>>> >> > **What should we do?** >> >>>>>> >> > >> >>>>>> >> > In my opinion, this is similar to using libraries in our >> >>>>>> application. >> >>>>>> >> > We do >> >>>>>> >> > not perform a publisher assessment for every library we >> use. We >> >>>>>> only >> >>>>>> >> verify >> >>>>>> >> > license compliance. >> >>>>>> >> > >> >>>>>> >> > On the other hand, it looks different because it is "Object >> >>>>>> Code", not >> >>>>>> >> > "Source Code". We do not use source code directly, but we >> >>>>>> use an >> >>>>>> object >> >>>>>> >> > prepared by a third party - "Derivative Works". >> >>>>>> >> > >> >>>>>> >> > In my opinion, relying on any Docker image ("Object Code") >> >>>>>> is OK >> >>>>>> if they >> >>>>>> >> > meet the following requirements: >> >>>>>> >> > - The Source Code required to create the object should be >> publicly >> >>>>>> >> > available and should be compatible with the Apache license. >> >>>>>> >> > - We should have s access to Compilation Information. The >> >>>>>> Compilation >> >>>>>> >> > Information must suffice to ensure that the continued >> functioning >> >>>>>> >> of the >> >>>>>> >> > source code is in no case prevented or interfered with solely >> >>>>>> because >> >>>>>> >> > modification has been made. >> >>>>>> >> > >> >>>>>> >> > Besides, we should provide the possibility to replace "Object >> >>>>>> code" with >> >>>>>> >> > other objects i.e., use of an image from a private third-party >> >>>>>> registry. >> >>>>>> >> > >> >>>>>> >> > Thank Jarek for paying attention to this issue. I didn't think >> >>>>>> >> about it >> >>>>>> >> > before, but now I know I couldn't use the Helm Chart in its >> >>>>>> current >> >>>>>> >> > form in >> >>>>>> >> > any of my work. I am afraid that many members of our community >> >>>>>> >> would face >> >>>>>> >> > similar problems if they tried to use it in a production >> >>>>>> environment. >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor < >> [email protected] >> >>>>>> > >> >>>>>> >> wrote: >> >>>>>> >> > >> >>>>>> >> >> Licensing wise there is no issue from me: The astronomerinc >> >>>>>> images are >> >>>>>> >> >> just re-packaging of the upstream images to apply security >> fixes >> >>>>>> >> so are >> >>>>>> >> >> licensed under whatever the original image is (MIT or Apache2 >> >>>>>> usually, >> >>>>>> >> >> else we wouldn't have put them in the helm chart PR) >> >>>>>> >> >> >> >>>>>> >> >> For background, the reason that we at Astronomer created >> >>>>>> >> >> ap-pgbouncer-exporter in the first place is that the upstream >> >>>>>> package >> >>>>>> >> >> does not patch/rebuild to address security >> vulnerabilities. By >> >>>>>> taking >> >>>>>> >> >> this in to airflow-ext it means we as a project become >> >>>>>> responsible for >> >>>>>> >> >> monitoring and testing that. (And don't be fooled in to >> thinking >> >>>>>> the >> >>>>>> >> >> free scanners can detect all vulns here, we've found them >> >>>>>> to be >> >>>>>> >> very of >> >>>>>> >> >> variable, and questionable accuracy.) >> >>>>>> >> >> >> >>>>>> >> >> That is a non-trivial amount of work for an open source >> project. >> >>>>>> >> >> >> >>>>>> >> >> Has this ever caused us any problems outside of Pip/python >> >>>>>> dependencies? >> >>>>>> >> >> (I'm not aware of any.) For runtime this maybe makes sense >> >>>>>> (again, I'm >> >>>>>> >> >> not yet convinced), but for test-only/dev-only deps this seems >> >>>>>> >> like a >> >>>>>> >> >> lot of work that we could better spend on working on >> >>>>>> Airflow. If >> >>>>>> >> we pin >> >>>>>> >> >> versions of docker image used then the only real risk is a >> >>>>>> left-pad >> >>>>>> >> >> scenario of "I'm deleting all my images" which is a minor >> risk. >> >>>>>> >> >> >> >>>>>> >> >> Do any other project do anything like this? I haven't >> seen it >> >>>>>> before. >> >>>>>> >> >> >> >>>>>> >> >> I'd vote for doing nothing and addressing this in specific >> cases >> >>>>>> >> when it >> >>>>>> >> >> becomes a problem. Because I do not see using thidy party >> docker >> >>>>>> images >> >>>>>> >> >> as a risk. I see it as a time saving measure. >> >>>>>> >> >> >> >>>>>> >> >> -ash >> >>>>>> >> >> >> >>>>>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk < >> >>>>>> [email protected]> >> >>>>>> >> wrote: >> >>>>>> >> >> >> >>>>>> >> >> > Hello everyone, >> >>>>>> >> >> > >> >>>>>> >> >> > TL;DR; I noticed that we are accumulating some >> >>>>>> dependencies to >> >>>>>> >> external >> >>>>>> >> >> > binaries (downloads and Docker images) which make the Apache >> >>>>>> Airflow >> >>>>>> >> >> > Community a bit vulnerable to external dependencies. I >> would >> >>>>>> love >> >>>>>> >> your >> >>>>>> >> >> > comments/opinions on the proposal I made around this. >> >>>>>> >> >> > >> >>>>>> >> >> > *More explanation/status:* >> >>>>>> >> >> > >> >>>>>> >> >> > While dependence is fine for officially "released" and >> >>>>>> "managed" by >> >>>>>> >> the >> >>>>>> >> >> > owning organizations, I think it is a bit risky to >> depend on >> >>>>>> those >> >>>>>> >> long >> >>>>>> >> >> > term and I think we should aim to bring all those >> "vulnerable" >> >>>>>> >> >> dependencies >> >>>>>> >> >> > into community control. >> >>>>>> >> >> > >> >>>>>> >> >> > I reviewed all our code (or I think all !) looking for such >> >>>>>> >> dependencies >> >>>>>> >> >> > and prepared an "umbrella" issue where I proposed the >> approach >> >>>>>> >> we can >> >>>>>> >> >> take >> >>>>>> >> >> > for all such dependencies. >> >>>>>> >> >> > >> >>>>>> >> >> > I could have missed some - so if you find others feel >> >>>>>> free to >> >>>>>> >> comment/add >> >>>>>> >> >> > the new ones. >> >>>>>> >> >> > All the details are captured here: >> >>>>>> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed >> >>>>>> the >> >>>>>> >> >> > context/motivation/current status and approach we can >> >>>>>> take for >> >>>>>> those >> >>>>>> >> >> > dependencies. >> >>>>>> >> >> > >> >>>>>> >> >> > A lot of those dependencies just need review and maybe some >> >>>>>> >> updates to >> >>>>>> >> >> > latest versions. And I do not think there is a lot to >> discuss >> >>>>>> for >> >>>>>> >> those. >> >>>>>> >> >> > >> >>>>>> >> >> > There is one point, however, that requires more deliberate >> >>>>>> >> action and >> >>>>>> >> >> some >> >>>>>> >> >> > decisions I think. >> >>>>>> >> >> > >> >>>>>> >> >> > We have some dependencies on Docker images that we are using >> >>>>>> from >> >>>>>> >> various >> >>>>>> >> >> > sources: >> >>>>>> >> >> > 1) officially maintained images >> >>>>>> >> >> > 2) images released by organizations that released them for >> >>>>>> their own >> >>>>>> >> >> > purpose, but they are not "officially maintained" by those >> >>>>>> >> organizations >> >>>>>> >> >> > 3) images released by private individuals >> >>>>>> >> >> > >> >>>>>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should >> >>>>>> bring the >> >>>>>> >> >> images >> >>>>>> >> >> > to Airflow community management. Here is the list of those >> >>>>>> >> images I >> >>>>>> >> found >> >>>>>> >> >> > that need to be moved to Airflow: >> >>>>>> >> >> > >> >>>>>> >> >> > - aneeshkj/helm-unittest >> >>>>>> >> >> > - ashb/apache-rat:0.13-1 >> >>>>>> >> >> > - godatadriven/krb5-kdc-server >> >>>>>> >> >> > - polinux/stress (?) >> >>>>>> >> >> > - osixia/openldap:1.2.0 >> >>>>>> >> >> > - astronomerinc/ap-statsd-exporter:0.11.0 >> >>>>>> >> >> > - astronomerinc/ap-pgbouncer:1.8.1 >> >>>>>> >> >> > - astronomerinc/ap-pgbouncer-exporter:0.5.0-1 >> >>>>>> >> >> > >> >>>>>> >> >> > >> >>>>>> >> >> > *Proposal*: >> >>>>>> >> >> > >> >>>>>> >> >> > My proposal is to make a folder in our repository on Github >> >>>>>> (continue >> >>>>>> >> >> with >> >>>>>> >> >> > the mono-repo approach we follow) to keep corresponding >> >>>>>> Dockerfiles >> >>>>>> >> and >> >>>>>> >> >> > scripts that build and release images from there. Now the >> only >> >>>>>> >> >> > question is >> >>>>>> >> >> > where to keep those images. We currently have apache/airflow >> >>>>>> but I >> >>>>>> >> >> > think we >> >>>>>> >> >> > should reserve it for airflow images only and we >> should keep >> >>>>>> those >> >>>>>> >> images >> >>>>>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images" >> of any >> >>>>>> >> sort in >> >>>>>> >> >> > DockerHub. We are already abusing a bit the "apache/airflow" >> >>>>>> >> >> namespace as >> >>>>>> >> >> > we are keeping both CI and production images there (but >> that's >> >>>>>> quite >> >>>>>> >> >> > OK as >> >>>>>> >> >> > the images are similar). >> >>>>>> >> >> > >> >>>>>> >> >> > My proposal will be to create an* "apache/airflow-ext"* >> >>>>>> DockerHub >> >>>>>> >> >> > repository and keep the images there. They will also >> be a >> >>>>>> little >> >>>>>> >> >> > abused because we will have to name them with tags - for >> >>>>>> example: >> >>>>>> >> >> > >> >>>>>> >> >> > - apache/airflow-ext:helm-unittest-[version] >> >>>>>> >> >> > - apache/airflow-ext:apache-rat-[version] >> >>>>>> >> >> > >> >>>>>> >> >> > I am also open to other names for the repo and proposals >> other >> >>>>>> ways >> >>>>>> >> >> > how to >> >>>>>> >> >> > handle that. >> >>>>>> >> >> > >> >>>>>> >> >> > I believe there is no issue with Licences for either of >> those >> >>>>>> images >> >>>>>> >> >> (Ash, >> >>>>>> >> >> > Kaxil, Fokko - some of the images are >> >>>>>> Astronomer's/GoDataDriven's >> >>>>>> >> >> ones - >> >>>>>> >> >> > can you comment on that ?) but I believe licensing on all >> >>>>>> those >> >>>>>> >> >> > images are >> >>>>>> >> >> > ok for us to copy with attribution (I will >> double-check that >> >>>>>> for other >> >>>>>> >> >> > images). >> >>>>>> >> >> > >> >>>>>> >> >> > WDYT? >> >>>>>> >> >> > >> >>>>>> >> >> > J. >> >>>>>> >> >> > >> >>>>>> >> >> > >> >>>>>> >> >> > >> >>>>>> >> >> > -- >> >>>>>> >> >> > >> >>>>>> >> >> > Jarek Potiuk >> >>>>>> >> >> > Polidea <https://www.polidea.com/> | Principal Software >> >>>>>> Engineer >> >>>>>> >> >> > >> >>>>>> >> >> > M: +48 660 796 129 <+48660796129> >> >>>>>> >> >> > [image: Polidea] <https://www.polidea.com/> >> >>>>>> >> >> > >> >>>>>> >> >> >> >>>>>> >> > >> >>>>>> >> >> >>>>>> > >> >>>>>> > >> >>>>>> > -- >> >>>>>> > >> >>>>>> > Jarek Potiuk >> >>>>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer >> >>>>>> > >> >>>>>> > M: +48 660 796 129 <+48660796129> >> >>>>>> > [image: Polidea] <https://www.polidea.com/> >> >>>>>> > >> >>>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> >> >>>>> Jarek Potiuk >> >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >>>>> >> >>>>> M: +48 660 796 129 <+48660796129> >> >>>>> [image: Polidea] <https://www.polidea.com/> >> >>>>> >> >>>>> >> >>>> >> >>>> -- >> >>>> >> >>>> Jarek Potiuk >> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >>>> >> >>>> M: +48 660 796 129 <+48660796129> >> >>>> [image: Polidea] <https://www.polidea.com/> >> >>>> >> >>>> >> >>> >> >>> -- >> >>> >> >>> Jarek Potiuk >> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >>> >> >>> M: +48 660 796 129 <+48660796129> >> >>> [image: Polidea] <https://www.polidea.com/> >> >>> >> >>> >> >> >> >> -- >> >> >> >> Jarek Potiuk >> >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> >> >> M: +48 660 796 129 <+48660796129> >> >> [image: Polidea] <https://www.polidea.com/> >> >> >> >> >> > >> > -- >> > >> > Jarek Potiuk >> > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > >> > M: +48 660 796 129 <+48660796129> >> > [image: Polidea] <https://www.polidea.com/> >> > >> > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >
