And the right Greg here :(, J.
On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <[email protected]> wrote: > Hey Ash, Greg, Daniel, > > So I understand there is no problem with licenses for those images and we > can get/use the sources for those? > > I would love to add the scripts/Dockerfiles to the sources - to be able to > rebuild the images. I have some of those already and would like to make a > PR, but It would be great if we can get the Dockerfile sources. I also want > to ask a few questions about versions of the base images (some of the base > images seem to be quite old and there are newer releases so I wanted to > check if there is anything to prevent upgrading them). > > J > > > On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk <[email protected]> > wrote: > >> >> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <[email protected]> >> wrote: >> >>> > - apache/airflow:statstd-exporter-2020.6.31 >>> > - apache/airflow:pgbouncer-2020.6.31 >>> > - apache/airflow:pgbouncer-exporter-2020.6.31 >> >> Do we count these as "releases" (i.e. do the PMC need to vote on them) >>> or not? >>> >> >> I think we should. I believe we should make it a part of regular release >> and vote together on "airflow + prod image + helm + dependent images". >> Then we might release each of those separately if needed - with >> separate voting/process (possibly we can bundle together several different >> things to release). Hence CalVer might make more sense even if we release >> them together with 1.10.x or 2.Y (especially that those deps are pretty >> much independent from the airflow version used). I think for Airflow + Prod >> image, it makes perfect sense to keep 1.10.* 2.0.* - but for Helm and >> dependent images - CalVer seems like a better idea. >> >> >> For these I think including the upstream version is useful too (either >>> as well, or instead) -- that way people can look at the right version of >>> the upstream docs when looking at what configuration options there are. >>> so `apache/airflow:pgbouncer-1.8.1-1` or >>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D ) >>> >> >> Agree. BTW. I wondered if anyone notices the date ;). >> >> (FYI For pgbouncer-exporter there are three such projects on github, >>> Juraj's was picked somewhat randomly) >>> >>> > I think now it's the matter of just following up with the >>> > releases of pgbouncer and libressl and libressl-dev >>> >>> That's still a fairly big "just". And there ssl libraries aren't the >>> only sources of security patches needed. Also the act of updating is the >>> easy part -- its the notification to know when updates are needed, and >>> ensuring that they happen in a timely manner that is the hard part :) >>> >> >> True. But I think we have some precedent in our CI/Prod images. We have >> it currently automated so that they self-maintain ad self-upgrade: >> https://github.com/apache/airflow/blob/master/CI.rst. The current CI >> automation is done in the way that we are catching up fairly quickly with >> the latest python patches - almost without noticing (well there is a few >> hours period where the builds on CI get slower and people need to update >> their Breeze images). But other than that it happens automatically and >> without anyone doing any active work there. >> >> I can do a very similar approach for all the images (both dev and >> runtime) and add a notification component to notify if any of the >> upstreaming deps changes. So it will be - from our side - mostly deciding >> if we should release it out-of-the-bands or wait for "regular" release. >> >> J. >> >> >>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <[email protected]> >>> wrote: >>> >>> > I think I'd feel more comfortable if we have it all under "community" >>> > umbrella. >>> > >>> > - For dev images - I think we have a good idea from couchdb. I will >>> make >>> > a POC of that and PR shortly. I already created airflowdev account on >>> > Dockerhub and make it available to PMCs of Airlfow and connect it to >>> our >>> > repo to automate Dev dependencies. >>> > - For the runtime (astronomer) images I took a deeper look and I >>> think >>> > it makes perfect sense to add them and release by Airflow Community >>> > as well: >>> > >>> > Here is what is in those images: >>> > >>> > - astronomerinc/ap-statsd-exporter >>> > < >>> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore >>> > >>> > - this image is just based on the official Prometheus Statsd >>> > exported with >>> > added file "/etc/statsd-exporter/mappings.yml". So the maintenance is >>> > mainly about keeping the mapping and possibly upgrade to lates >>> released >>> > prometheus-statsd occasionally. The first one sounds like a good >>> > idea for >>> > community work, the second we can easily automate - same way as we >>> > do for >>> > production images. Seems that this one is updated once every few >>> > months, so >>> > we can easily do that. astronomerinc/ap-pgbouncer:latest >>> > - astronomerinc/ap-pgbouncer >>> > < >>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore >>> > >>> > - this is just packaging pgbouncer into an image - this one seems to >>> be >>> > updated more frequently in the past but I think now it's the matter >>> > of just >>> > following up with the releases of pgbouncer and libressl and >>> lbressl-dev >>> > >>> > < >>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore >>> > >>> > - astronomerinc/ap-pgbouncer-exporter >>> > < >>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore >>> > >>> > - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer >>> Prometheus >>> > exporter with libressl and libressl-dev library upgraded. Also >>> usually >>> > updated every few months. Here I think it would also make sense to >>> bring >>> > the source code in to the community for Juraj's image as well. >>> > >>> > I also think it would make sense (unlike the dev dependencies) to >>> publish >>> > all "runtime" devs under the "apache/airflow" repository. That would >>> > be a >>> > bit awkward, but I think it's the least "effort" we need to maintain >>> and >>> > make sure it is officially "blessed" during the release. >>> > >>> > So the proposal I have (if we use calver versioning similar to backport >>> > packages): >>> > >>> > - apache/airflow:statstd-exporter-2020.6.31 >>> > - apache/airflow:pgbouncer-2020.6.31 >>> > - apache/airflow:pgbouncer-exporter-2020.6.31 >>> > >>> > I am happy to bring it all to our repo and setup automation. >>> > >>> > J. >>> > >>> > >>> > >>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <[email protected]> >>> wrote: >>> > >>> >> Wow Kamil that's an awesome and mature processs for a company to take >>> -- >>> >> I wish more companies treated open source deps that way. >>> >> >>> >> As I mentioned in the original Helm PR (but just in a comment left to >>> a >>> >> review), I left a few of the "support" Docker images as astronomerinc >>> >> ones as the upstream Docker images are "unmaintained" (that isn't to >>> say >>> >> the projects are, just that the images aren't re-published in a timely >>> >> fashion to update openssl etc.) >>> >> >>> >> I am happy to replace the astronomerinc support images with others if >>> we >>> >> want to. I am also happy to clarify/make explicit the license >>> situation >>> >> that those images are distributed under (Apache 2) if we want to stick >>> >> with them and let us (Astronomer) carry the burden of patching and >>> >> updating them -- it is after all part of what people pay us for so >>> we'll >>> >> be doing it anyway. >>> >> >>> >> > Besides, we should provide the possibility to replace "Object code" >>> with >>> >> > other objects i.e., use of an image from a private third-party >>> registry. >>> >> >>> >> The images to use come from the helm values, so are easily changable >>> at >>> >> helm install/upgrade time: >>> >> >>> >> >>> >> >>> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92 >>> >> >>> >> -ash >>> >> >>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła <[email protected]> >>> >> wrote: >>> >> >>> >> > These files have no information to determine the license. In my >>> opinion, >>> >> > these images ("Derivative Works") should be treated as Astronomer's >>> or >>> >> > other users' copyrighted files. Please note that Astronomer may >>> >> distribute >>> >> > the images under a different license, but they need to acknowledge >>> the >>> >> use >>> >> > of the Foundation or other licensed software. To do otherwise would >>> be >>> >> > stealing. >>> >> > >>> >> > DockerHub is not an Open Source software registry, and we cannot >>> assume >>> >> > that every image there is available under a license that allows >>> >> free use. >>> >> > >>> >> > **What does this mean for the project?** >>> >> > >>> >> > This is incompatible with the Apache license because each runtime >>> >> > dependencies must also be based on the Apache-compatible license. >>> These >>> >> > images are required to run the Helm Chart, so are its dependencies >>> >> > Dependencies that are not compatible with the Apache license are a >>> >> problem >>> >> > for our users and prevent the use of this project. >>> >> > >>> >> > **How do we deal with this topic in my organization?** >>> >> > >>> >> > We take the topic of copyright very seriously in my organization. >>> >> One of >>> >> > the steps we take before publishing a derivative work based on an >>> >> > Open-Source license is to audit the source code to see if each part >>> is >>> >> > under a license that allows us to use it. If we build images or >>> artifacts >>> >> > automatically, we take steps that prevent the accidental publication >>> >> > of an >>> >> > artifact that could contain works that have an incorrect license. >>> >> > >>> >> > We do this by building the audited internal registry: >>> >> > - In the case of Airflow, this is a copy of the source code and the >>> >> > necessary PIP libraries stored in the blockchain-based registry >>> >> > (append-only registry). Any change in such a registry undergoes a >>> review >>> >> > process and must be approved. It is not possible to revert an >>> approved >>> >> > change without leaving a trace. >>> >> > - In the case of Docker images, this means that each image is built >>> >> > automatically, and no one publishes the images to images register >>> >> manually >>> >> > (docker push). No step can download files from a registry that is >>> not >>> >> > auditable. >>> >> > >>> >> > Such steps allow you to recreate the software development process, >>> >> > e.g. in >>> >> > the case of a court case. >>> >> > >>> >> > In our case, it won't be easy to introduce all similar requirements, >>> >> > but we >>> >> > can try to be compatible with them so that organizations that have >>> the >>> >> same >>> >> > requirements can meet them. >>> >> > >>> >> > **What should we do?** >>> >> > >>> >> > In my opinion, this is similar to using libraries in our >>> application. >>> >> > We do >>> >> > not perform a publisher assessment for every library we use. We only >>> >> verify >>> >> > license compliance. >>> >> > >>> >> > On the other hand, it looks different because it is "Object Code", >>> not >>> >> > "Source Code". We do not use source code directly, but we use an >>> object >>> >> > prepared by a third party - "Derivative Works". >>> >> > >>> >> > In my opinion, relying on any Docker image ("Object Code") is OK if >>> they >>> >> > meet the following requirements: >>> >> > - The Source Code required to create the object should be publicly >>> >> > available and should be compatible with the Apache license. >>> >> > - We should have s access to Compilation Information. The >>> Compilation >>> >> > Information must suffice to ensure that the continued functioning >>> >> of the >>> >> > source code is in no case prevented or interfered with solely >>> because >>> >> > modification has been made. >>> >> > >>> >> > Besides, we should provide the possibility to replace "Object code" >>> with >>> >> > other objects i.e., use of an image from a private third-party >>> registry. >>> >> > >>> >> > Thank Jarek for paying attention to this issue. I didn't think >>> >> about it >>> >> > before, but now I know I couldn't use the Helm Chart in its current >>> >> > form in >>> >> > any of my work. I am afraid that many members of our community >>> >> would face >>> >> > similar problems if they tried to use it in a production >>> environment. >>> >> > >>> >> > >>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <[email protected]> >>> >> wrote: >>> >> > >>> >> >> Licensing wise there is no issue from me: The astronomerinc images >>> are >>> >> >> just re-packaging of the upstream images to apply security fixes >>> >> so are >>> >> >> licensed under whatever the original image is (MIT or Apache2 >>> usually, >>> >> >> else we wouldn't have put them in the helm chart PR) >>> >> >> >>> >> >> For background, the reason that we at Astronomer created >>> >> >> ap-pgbouncer-exporter in the first place is that the upstream >>> package >>> >> >> does not patch/rebuild to address security vulnerabilities. By >>> taking >>> >> >> this in to airflow-ext it means we as a project become responsible >>> for >>> >> >> monitoring and testing that. (And don't be fooled in to thinking >>> the >>> >> >> free scanners can detect all vulns here, we've found them to be >>> >> very of >>> >> >> variable, and questionable accuracy.) >>> >> >> >>> >> >> That is a non-trivial amount of work for an open source project. >>> >> >> >>> >> >> Has this ever caused us any problems outside of Pip/python >>> dependencies? >>> >> >> (I'm not aware of any.) For runtime this maybe makes sense (again, >>> I'm >>> >> >> not yet convinced), but for test-only/dev-only deps this seems >>> >> like a >>> >> >> lot of work that we could better spend on working on Airflow. If >>> >> we pin >>> >> >> versions of docker image used then the only real risk is a left-pad >>> >> >> scenario of "I'm deleting all my images" which is a minor risk. >>> >> >> >>> >> >> Do any other project do anything like this? I haven't seen it >>> before. >>> >> >> >>> >> >> I'd vote for doing nothing and addressing this in specific cases >>> >> when it >>> >> >> becomes a problem. Because I do not see using thidy party docker >>> images >>> >> >> as a risk. I see it as a time saving measure. >>> >> >> >>> >> >> -ash >>> >> >> >>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <[email protected] >>> > >>> >> wrote: >>> >> >> >>> >> >> > Hello everyone, >>> >> >> > >>> >> >> > TL;DR; I noticed that we are accumulating some dependencies to >>> >> external >>> >> >> > binaries (downloads and Docker images) which make the Apache >>> Airflow >>> >> >> > Community a bit vulnerable to external dependencies. I would >>> love >>> >> your >>> >> >> > comments/opinions on the proposal I made around this. >>> >> >> > >>> >> >> > *More explanation/status:* >>> >> >> > >>> >> >> > While dependence is fine for officially "released" and "managed" >>> by >>> >> the >>> >> >> > owning organizations, I think it is a bit risky to depend on >>> those >>> >> long >>> >> >> > term and I think we should aim to bring all those "vulnerable" >>> >> >> dependencies >>> >> >> > into community control. >>> >> >> > >>> >> >> > I reviewed all our code (or I think all !) looking for such >>> >> dependencies >>> >> >> > and prepared an "umbrella" issue where I proposed the approach >>> >> we can >>> >> >> take >>> >> >> > for all such dependencies. >>> >> >> > >>> >> >> > I could have missed some - so if you find others feel free to >>> >> comment/add >>> >> >> > the new ones. >>> >> >> > All the details are captured here: >>> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed the >>> >> >> > context/motivation/current status and approach we can take for >>> those >>> >> >> > dependencies. >>> >> >> > >>> >> >> > A lot of those dependencies just need review and maybe some >>> >> updates to >>> >> >> > latest versions. And I do not think there is a lot to discuss for >>> >> those. >>> >> >> > >>> >> >> > There is one point, however, that requires more deliberate >>> >> action and >>> >> >> some >>> >> >> > decisions I think. >>> >> >> > >>> >> >> > We have some dependencies on Docker images that we are using from >>> >> various >>> >> >> > sources: >>> >> >> > 1) officially maintained images >>> >> >> > 2) images released by organizations that released them for their >>> own >>> >> >> > purpose, but they are not "officially maintained" by those >>> >> organizations >>> >> >> > 3) images released by private individuals >>> >> >> > >>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should bring >>> the >>> >> >> images >>> >> >> > to Airflow community management. Here is the list of those >>> >> images I >>> >> found >>> >> >> > that need to be moved to Airflow: >>> >> >> > >>> >> >> > - aneeshkj/helm-unittest >>> >> >> > - ashb/apache-rat:0.13-1 >>> >> >> > - godatadriven/krb5-kdc-server >>> >> >> > - polinux/stress (?) >>> >> >> > - osixia/openldap:1.2.0 >>> >> >> > - astronomerinc/ap-statsd-exporter:0.11.0 >>> >> >> > - astronomerinc/ap-pgbouncer:1.8.1 >>> >> >> > - astronomerinc/ap-pgbouncer-exporter:0.5.0-1 >>> >> >> > >>> >> >> > >>> >> >> > *Proposal*: >>> >> >> > >>> >> >> > My proposal is to make a folder in our repository on Github >>> (continue >>> >> >> with >>> >> >> > the mono-repo approach we follow) to keep corresponding >>> Dockerfiles >>> >> and >>> >> >> > scripts that build and release images from there. Now the only >>> >> >> > question is >>> >> >> > where to keep those images. We currently have apache/airflow but >>> I >>> >> >> > think we >>> >> >> > should reserve it for airflow images only and we should keep >>> those >>> >> images >>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any >>> >> sort in >>> >> >> > DockerHub. We are already abusing a bit the "apache/airflow" >>> >> >> namespace as >>> >> >> > we are keeping both CI and production images there (but that's >>> quite >>> >> >> > OK as >>> >> >> > the images are similar). >>> >> >> > >>> >> >> > My proposal will be to create an* "apache/airflow-ext"* DockerHub >>> >> >> > repository and keep the images there. They will also be a little >>> >> >> > abused because we will have to name them with tags - for example: >>> >> >> > >>> >> >> > - apache/airflow-ext:helm-unittest-[version] >>> >> >> > - apache/airflow-ext:apache-rat-[version] >>> >> >> > >>> >> >> > I am also open to other names for the repo and proposals other >>> ways >>> >> >> > how to >>> >> >> > handle that. >>> >> >> > >>> >> >> > I believe there is no issue with Licences for either of those >>> images >>> >> >> (Ash, >>> >> >> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's >>> >> >> ones - >>> >> >> > can you comment on that ?) but I believe licensing on all those >>> >> >> > images are >>> >> >> > ok for us to copy with attribution (I will double-check that for >>> other >>> >> >> > images). >>> >> >> > >>> >> >> > WDYT? >>> >> >> > >>> >> >> > J. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > >>> >> >> > Jarek Potiuk >>> >> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >> >> > >>> >> >> > M: +48 660 796 129 <+48660796129> >>> >> >> > [image: Polidea] <https://www.polidea.com/> >>> >> >> > >>> >> >> >>> >> > >>> >> >>> > >>> > >>> > -- >>> > >>> > Jarek Potiuk >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer >>> > >>> > M: +48 660 796 129 <+48660796129> >>> > [image: Polidea] <https://www.polidea.com/> >>> > >>> >> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> >> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
