And the right Greg here :(,

J.



On Thu, Jul 2, 2020 at 12:18 PM Jarek Potiuk <[email protected]>
wrote:

> Hey Ash, Greg, Daniel,
>
> So I understand there is no problem with licenses for those images and we
> can get/use the sources for those?
>
> I would love to add the scripts/Dockerfiles to the sources - to be able to
> rebuild the images. I have some of those already and would like to make a
> PR, but It would be great if we can get the Dockerfile sources. I also want
> to ask a few questions about versions of the base images (some of the base
> images seem to be quite old and there are newer releases so I wanted to
> check if there is anything to prevent upgrading them).
>
> J
>
>
> On Thu, Jun 25, 2020 at 12:58 PM Jarek Potiuk <[email protected]>
> wrote:
>
>>
>> On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <[email protected]>
>> wrote:
>>
>>> > - apache/airflow:statstd-exporter-2020.6.31
>>> > - apache/airflow:pgbouncer-2020.6.31
>>> > - apache/airflow:pgbouncer-exporter-2020.6.31
>>
>> Do we count these as "releases" (i.e. do the PMC need to vote on them)
>>> or not?
>>>
>>
>> I think we should. I believe we should make it a part of regular release
>> and vote together on "airflow + prod image + helm + dependent images".
>> Then we might release each of those separately if needed -  with
>> separate voting/process (possibly we can bundle together several different
>> things to release). Hence CalVer might make more sense even if we release
>> them together with 1.10.x or 2.Y (especially that those deps are pretty
>> much independent from the airflow version used). I think for Airflow + Prod
>> image, it makes perfect sense to keep 1.10.* 2.0.* - but for Helm and
>> dependent images - CalVer seems like a better idea.
>>
>>
>> For these I think including the upstream version is useful too (either
>>> as well, or instead) -- that way people can look at the right version of
>>> the upstream docs when looking at what configuration options there are.
>>> so `apache/airflow:pgbouncer-1.8.1-1` or
>>> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D )
>>>
>>
>> Agree. BTW. I wondered if anyone notices the date ;).
>>
>> (FYI For pgbouncer-exporter there are three such projects on github,
>>> Juraj's was picked somewhat randomly)
>>>
>>> > I think now it's the matter of just following up with the
>>> > releases of pgbouncer and libressl and libressl-dev
>>>
>>> That's still a fairly big "just". And there ssl libraries aren't the
>>> only sources of security patches needed. Also the act of updating is the
>>> easy part -- its the notification to know when updates are needed, and
>>> ensuring that they happen in a timely manner that is the hard part :)
>>>
>>
>> True. But I think we have some precedent in our CI/Prod images. We have
>> it currently automated so that they self-maintain ad self-upgrade:
>> https://github.com/apache/airflow/blob/master/CI.rst. The current CI
>> automation is done in the way that we are catching up fairly quickly with
>> the latest python patches - almost without noticing (well there is a few
>> hours period where the builds on CI get slower and people need to update
>> their Breeze images). But other than that it happens automatically and
>> without anyone doing any active work there.
>>
>> I can do a very similar approach for all the images (both dev and
>> runtime) and add a notification component to notify if any of the
>> upstreaming deps changes. So it will be - from our side - mostly deciding
>> if we should release it out-of-the-bands or wait for "regular" release.
>>
>> J.
>>
>>
>>> On Jun 25 2020, at 11:05 am, Jarek Potiuk <[email protected]>
>>> wrote:
>>>
>>> > I think  I'd feel more comfortable if we have it all under "community"
>>> > umbrella.
>>> >
>>> >   - For dev images - I think we have a good idea from couchdb. I will
>>> make
>>> >   a POC of that and PR shortly. I already created airflowdev account on
>>> >   Dockerhub and make it available to PMCs of Airlfow and connect it to
>>> our
>>> >   repo to automate Dev dependencies.
>>> >   - For the runtime (astronomer) images I took a deeper look and I
>>> think
>>> >   it makes perfect sense to add them and release by Airflow Community
>>> > as well:
>>> >
>>> > Here is what is in those images:
>>> >
>>> >   - astronomerinc/ap-statsd-exporter
>>> >   <
>>> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore
>>> >
>>> >   - this image is just based on the official Prometheus Statsd
>>> > exported with
>>> >   added file "/etc/statsd-exporter/mappings.yml". So the maintenance is
>>> >   mainly about keeping the mapping and possibly upgrade to lates
>>> released
>>> >   prometheus-statsd occasionally. The first one sounds like a good
>>> > idea for
>>> >   community work, the second we can easily automate - same way as we
>>> > do for
>>> >   production images. Seems that this one is updated once every few
>>> > months, so
>>> >   we can easily do that. astronomerinc/ap-pgbouncer:latest
>>> >   - astronomerinc/ap-pgbouncer
>>> >   <
>>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore
>>> >
>>> >   - this is just packaging pgbouncer into an image - this one seems to
>>> be
>>> >   updated more frequently in the past but I think now it's the matter
>>> > of just
>>> >   following up with the releases of pgbouncer and libressl and
>>> lbressl-dev
>>> >
>>> >   <
>>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
>>> >
>>> >   - astronomerinc/ap-pgbouncer-exporter
>>> >   <
>>> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
>>> >
>>> >   - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer
>>> Prometheus
>>> >   exporter with libressl and libressl-dev library upgraded. Also
>>> usually
>>> >   updated every few months. Here I think it would also make sense to
>>> bring
>>> >   the source code in to the community for Juraj's image as well.
>>> >
>>> > I also think it would make sense (unlike the dev dependencies) to
>>> publish
>>> > all "runtime" devs under the "apache/airflow" repository. That would
>>> > be a
>>> > bit awkward, but I think it's the least "effort" we need to maintain
>>> and
>>> > make sure it is officially "blessed" during the release.
>>> >
>>> > So the proposal I have (if we use calver versioning similar to backport
>>> > packages):
>>> >
>>> >   - apache/airflow:statstd-exporter-2020.6.31
>>> >   - apache/airflow:pgbouncer-2020.6.31
>>> >   - apache/airflow:pgbouncer-exporter-2020.6.31
>>> >
>>> > I am happy to bring it all to our repo and setup automation.
>>> >
>>> > J.
>>> >
>>> >
>>> >
>>> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <[email protected]>
>>> wrote:
>>> >
>>> >> Wow Kamil that's an awesome and mature processs for a company to take
>>> --
>>> >> I wish more companies treated open source deps that way.
>>> >>
>>> >> As I mentioned in the original Helm PR (but just in a comment left to
>>> a
>>> >> review), I left a few of the "support" Docker images as astronomerinc
>>> >> ones as the upstream Docker images are "unmaintained" (that isn't to
>>> say
>>> >> the projects are, just that the images aren't re-published in a timely
>>> >> fashion to update openssl etc.)
>>> >>
>>> >> I am happy to replace the astronomerinc support images with others if
>>> we
>>> >> want to. I am also happy to clarify/make explicit the license
>>> situation
>>> >> that those images are distributed under (Apache 2) if we want to stick
>>> >> with them and let us (Astronomer) carry the burden of patching and
>>> >> updating them -- it is after all part of what people pay us for so
>>> we'll
>>> >> be doing it anyway.
>>> >>
>>> >> > Besides, we should provide the possibility to replace "Object code"
>>> with
>>> >> > other objects i.e., use of an image from a private third-party
>>> registry.
>>> >>
>>> >> The images to use come from the helm values, so are easily changable
>>> at
>>> >> helm install/upgrade time:
>>> >>
>>> >>
>>> >>
>>> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
>>> >>
>>> >> -ash
>>> >>
>>> >> On Jun 24 2020, at 9:07 am, Kamil Breguła <[email protected]>
>>> >> wrote:
>>> >>
>>> >> > These files have no information to determine the license.  In my
>>> opinion,
>>> >> > these images ("Derivative Works") should be treated as Astronomer's
>>> or
>>> >> > other users' copyrighted files. Please note that Astronomer may
>>> >> distribute
>>> >> > the images under a different license, but they need to acknowledge
>>> the
>>> >> use
>>> >> > of the Foundation or other licensed software. To do otherwise would
>>> be
>>> >> > stealing.
>>> >> >
>>> >> > DockerHub is not an Open Source software registry, and we cannot
>>> assume
>>> >> > that every image there is available under a license that allows
>>> >> free use.
>>> >> >
>>> >> > **What does this mean for the project?**
>>> >> >
>>> >> > This is incompatible with the Apache license because each runtime
>>> >> > dependencies must also be based on the Apache-compatible license.
>>> These
>>> >> > images are required to run the Helm Chart, so are its dependencies
>>> >> > Dependencies that are not compatible with the Apache license are a
>>> >> problem
>>> >> > for our users and prevent the use of this project.
>>> >> >
>>> >> > **How do we deal with this topic in my organization?**
>>> >> >
>>> >> > We take the topic of copyright very seriously in my organization.
>>> >> One of
>>> >> > the steps we take before publishing a derivative work based on an
>>> >> > Open-Source license is to audit the source code to see if each part
>>> is
>>> >> > under a license that allows us to use it. If we build images or
>>> artifacts
>>> >> > automatically, we take steps that prevent the accidental publication
>>> >> > of an
>>> >> > artifact that could contain works that have an incorrect license.
>>> >> >
>>> >> > We do this by building the audited internal registry:
>>> >> > - In the case of Airflow, this is a copy of the source code and the
>>> >> > necessary PIP libraries stored in the blockchain-based registry
>>> >> > (append-only registry). Any change in such a registry undergoes a
>>> review
>>> >> > process and must be approved. It is not possible to revert an
>>> approved
>>> >> > change without leaving a trace.
>>> >> > - In the case of Docker images, this means that each image is built
>>> >> > automatically, and no one publishes the images to images register
>>> >> manually
>>> >> > (docker push). No step can download files from a registry that is
>>> not
>>> >> > auditable.
>>> >> >
>>> >> > Such steps allow you to recreate the software development process,
>>> >> > e.g. in
>>> >> > the case of a court case.
>>> >> >
>>> >> > In our case, it won't be easy to introduce all similar requirements,
>>> >> > but we
>>> >> > can try to be compatible with them so that organizations that have
>>> the
>>> >> same
>>> >> > requirements can meet them.
>>> >> >
>>> >> > **What should we do?**
>>> >> >
>>> >> > In my opinion, this is similar to using libraries in our
>>> application.
>>> >> > We do
>>> >> > not perform a publisher assessment for every library we use. We only
>>> >> verify
>>> >> > license compliance.
>>> >> >
>>> >> > On the other hand, it looks different because it is "Object Code",
>>> not
>>> >> > "Source Code". We do not use source code directly, but we use an
>>> object
>>> >> > prepared by a third party - "Derivative Works".
>>> >> >
>>> >> > In my opinion, relying on any Docker image ("Object Code") is OK if
>>> they
>>> >> > meet the following requirements:
>>> >> > - The Source Code required to create the object should be publicly
>>> >> > available and should be compatible with the Apache license.
>>> >> > - We should have s access to Compilation Information. The
>>> Compilation
>>> >> > Information must suffice to ensure that the continued functioning
>>> >> of the
>>> >> > source code is in no case prevented or interfered with solely
>>> because
>>> >> > modification has been made.
>>> >> >
>>> >> > Besides, we should provide the possibility to replace "Object code"
>>> with
>>> >> > other objects i.e., use of an image from a private third-party
>>> registry.
>>> >> >
>>> >> > Thank Jarek for paying attention to this issue.  I didn't think
>>> >> about it
>>> >> > before, but now I know I couldn't use the Helm Chart in its current
>>> >> > form in
>>> >> > any of my work. I am afraid that many members of our community
>>> >> would face
>>> >> > similar problems if they tried to use it in a production
>>> environment.
>>> >> >
>>> >> >
>>> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <[email protected]>
>>> >> wrote:
>>> >> >
>>> >> >> Licensing wise there is no issue from me: The astronomerinc images
>>> are
>>> >> >> just re-packaging of the upstream images to apply security fixes
>>> >> so are
>>> >> >> licensed under whatever the original image is (MIT or Apache2
>>> usually,
>>> >> >> else we wouldn't have put them in the helm chart PR)
>>> >> >>
>>> >> >> For background, the reason that we at Astronomer created
>>> >> >> ap-pgbouncer-exporter in the first place is that the upstream
>>> package
>>> >> >> does not patch/rebuild to address security vulnerabilities. By
>>> taking
>>> >> >> this in to airflow-ext it means we as a project become responsible
>>> for
>>> >> >> monitoring and testing that. (And don't be fooled in to thinking
>>> the
>>> >> >> free scanners can detect all vulns here, we've found them to be
>>> >> very of
>>> >> >> variable, and questionable accuracy.)
>>> >> >>
>>> >> >> That is a non-trivial amount of work for an open source project.
>>> >> >>
>>> >> >> Has this ever caused us any problems outside of Pip/python
>>> dependencies?
>>> >> >> (I'm not aware of any.) For runtime this maybe makes sense (again,
>>> I'm
>>> >> >> not yet convinced), but for test-only/dev-only deps this seems
>>> >> like a
>>> >> >> lot of work that we could better spend on working on Airflow. If
>>> >> we pin
>>> >> >> versions of docker image used then the only real risk is a left-pad
>>> >> >> scenario of "I'm deleting all my images" which is a minor risk.
>>> >> >>
>>> >> >> Do any other project do anything like this? I haven't seen it
>>> before.
>>> >> >>
>>> >> >> I'd vote for doing nothing and addressing this in specific cases
>>> >> when it
>>> >> >> becomes a problem. Because I do not see using thidy party docker
>>> images
>>> >> >> as a risk. I see it as a time saving measure.
>>> >> >>
>>> >> >> -ash
>>> >> >>
>>> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <[email protected]
>>> >
>>> >> wrote:
>>> >> >>
>>> >> >> > Hello everyone,
>>> >> >> >
>>> >> >> > TL;DR; I noticed that we are accumulating some dependencies to
>>> >> external
>>> >> >> > binaries (downloads and Docker images) which make the Apache
>>> Airflow
>>> >> >> > Community a bit vulnerable to external dependencies.  I would
>>> love
>>> >> your
>>> >> >> > comments/opinions on the proposal I made around this.
>>> >> >> >
>>> >> >> > *More explanation/status:*
>>> >> >> >
>>> >> >> > While dependence is fine for officially "released" and "managed"
>>> by
>>> >> the
>>> >> >> > owning organizations, I think it is a bit risky to depend on
>>> those
>>> >> long
>>> >> >> > term and I think we should aim to bring all those "vulnerable"
>>> >> >> dependencies
>>> >> >> > into community control.
>>> >> >> >
>>> >> >> > I reviewed all our code (or I think all !) looking for such
>>> >> dependencies
>>> >> >> > and prepared an "umbrella" issue where I proposed the approach
>>> >> we can
>>> >> >> take
>>> >> >> > for all such dependencies.
>>> >> >> >
>>> >> >> > I could have missed some - so if you find others feel free to
>>> >> comment/add
>>> >> >> > the new ones.
>>> >> >> > All the details are captured here:
>>> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed the
>>> >> >> > context/motivation/current status and approach we can take for
>>> those
>>> >> >> > dependencies.
>>> >> >> >
>>> >> >> > A lot of those dependencies just need review and maybe some
>>> >> updates to
>>> >> >> > latest versions. And I do not think there is a lot to discuss for
>>> >> those.
>>> >> >> >
>>> >> >> > There is one point, however, that requires more deliberate
>>> >> action and
>>> >> >> some
>>> >> >> > decisions I think.
>>> >> >> >
>>> >> >> > We have some dependencies on Docker images that we are using from
>>> >> various
>>> >> >> > sources:
>>> >> >> > 1) officially maintained images
>>> >> >> > 2) images released by organizations that released them for their
>>> own
>>> >> >> > purpose, but they are not "officially maintained" by those
>>> >> organizations
>>> >> >> > 3) images released by private individuals
>>> >> >> >
>>> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should bring
>>> the
>>> >> >> images
>>> >> >> > to Airflow community management. Here is the list of those
>>> >> images I
>>> >> found
>>> >> >> > that need to be moved to Airflow:
>>> >> >> >
>>> >> >> >   - aneeshkj/helm-unittest
>>> >> >> >   - ashb/apache-rat:0.13-1
>>> >> >> >   - godatadriven/krb5-kdc-server
>>> >> >> >   - polinux/stress (?)
>>> >> >> >   - osixia/openldap:1.2.0
>>> >> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
>>> >> >> >   - astronomerinc/ap-pgbouncer:1.8.1
>>> >> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
>>> >> >> >
>>> >> >> >
>>> >> >> > *Proposal*:
>>> >> >> >
>>> >> >> > My proposal is to make a folder in our repository on Github
>>> (continue
>>> >> >> with
>>> >> >> > the mono-repo approach we follow) to keep corresponding
>>> Dockerfiles
>>> >> and
>>> >> >> > scripts that build and release images from there. Now the only
>>> >> >> > question is
>>> >> >> > where to keep those images. We currently have apache/airflow but
>>> I
>>> >> >> > think we
>>> >> >> > should reserve it for airflow images only and we should keep
>>> those
>>> >> images
>>> >> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any
>>> >> sort in
>>> >> >> > DockerHub. We are already abusing a bit the "apache/airflow"
>>> >> >> namespace as
>>> >> >> > we are keeping both CI and production images there (but that's
>>> quite
>>> >> >> > OK as
>>> >> >> > the images are similar).
>>> >> >> >
>>> >> >> > My proposal will be to create an* "apache/airflow-ext"* DockerHub
>>> >> >> > repository and keep the images there. They will also be a little
>>> >> >> > abused because we will have to name them with tags - for example:
>>> >> >> >
>>> >> >> >   - apache/airflow-ext:helm-unittest-[version]
>>> >> >> >   - apache/airflow-ext:apache-rat-[version]
>>> >> >> >
>>> >> >> > I am also open to other names for the repo and proposals other
>>> ways
>>> >> >> > how to
>>> >> >> > handle that.
>>> >> >> >
>>> >> >> > I believe there is no issue with Licences for either of those
>>> images
>>> >> >> (Ash,
>>> >> >> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's
>>> >> >> ones -
>>> >> >> > can you comment on that ?)  but I believe licensing on all those
>>> >> >> > images are
>>> >> >> > ok for us to copy with attribution (I will double-check that for
>>> other
>>> >> >> > images).
>>> >> >> >
>>> >> >> > WDYT?
>>> >> >> >
>>> >> >> > J.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> >
>>> >> >> > Jarek Potiuk
>>> >> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> >> >> >
>>> >> >> > M: +48 660 796 129 <+48660796129>
>>> >> >> > [image: Polidea] <https://www.polidea.com/>
>>> >> >> >
>>> >> >>
>>> >> >
>>> >>
>>> >
>>> >
>>> > --
>>> >
>>> > Jarek Potiuk
>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> >
>>> > M: +48 660 796 129 <+48660796129>
>>> > [image: Polidea] <https://www.polidea.com/>
>>> >
>>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to