And one more perfect illustration of what I am talking about.

A very good thing just happened. I was running the PR while writing the
email (long time as you might imagine) and the new K8S tests with 1.10.11
just failed. https://github.com/apache/airflow/pull/9663

If had released the helm chart before we would've clear (small)
incompatibility here. And by seeing the test failing we could make decision
what to do:

1) fix it differently
2) document it as a breaking Helm change,  "1.10.12+ image" and make test
work in both cases
3) revert ...

But at least we have na early warning that something is wrong. This is the
clear value of running the tests at every commit.

J.

On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> I just have another example of a case where splitting the repos and using
> only "released versions" across repositories might be a complete overkill
> when it comes to development complexity.
>
> We have this change from Aneesh:
> https://github.com/apache/airflow/pull/9371 about adding a git-sync
> option to the helm chart.
>
> That's a new feature, but we would like to test both 1.10 and the master
> version of KubernetesExecutor with that. It should work for both of them -
> there is no coupling/dependency in the "airflow' code for it.
>
> However, there is a strong coupling in the tests. We have the
> "kubernetes_tests" running tests using all three: chart, production docker,
> and Airflow, Those tests will have to be likely adapted to work with the
> new git-sync option. They were disabled previously as we had problems with
> them before the helm chart was used for tests but we can turn them back on
> now when git-sync is added to the helm chart. Those tests are part of
> airflow test suite and we discussed with Daniel that they should stay there
> - those tests are importing airflow code, they are using latest example
> dags which are also in the airflow code.
>
> So we have two ways how we can develop this -
> A) monorepo (current)
> B) separate repos.
>
> Just to remind - he goal is that our change is tested against:
>
> 1) Released Airflow version (say 1.10.11).
> 2) Development airflow version (master - soon possibly development)
> 3) Development docker image built with either "development" or "1.10.11"
> (we can release the Docker image for 1.10.11 independently from the current
> development HEAD). The docker image is supposed to work with any version of
> airflow
>
> In the case of A) Monorepo we have all that as a given.
>
> I just sent this really small PR that should do the job:
> https://github.com/apache/airflow/pull/9663. What it does, it takes the
> latest "development" docker image, "development" chart, bakes in the latest
> "example dags" from "development branch". The image uses either
> "development" or released (from PyPI) "1.10.11" Airflow version - and run
> the "development" tests against it. This is exactly what we want. If we add
> new features to the helm chart, the Kubernetes tests will have to be
> updated to include that - and this will happen in the airflow "development"
> branch. The REALLY good thing in it - since we are running those tests in
> CI build of airflow development branch - we prevent anyone from making
> breaking changes. It is a given that both - the "development" of airflow
> and the "1.10.11" version of airflow will continue to work with the image
> and chart.
>
>
> In the case of B) where we split the repos:
>
> We have to decide where to keep the "kubernetes_tests" - should they be in
> "Airflow" or in "Helm". They are testing BOTH so we can choose either way.
> Together with Daniel we plan to expand those tests to cover all the
> different options we have in the Chart - testing all of it - Kubernetes
> Executor, Celery Executor running on Kubernetes, MySQL (once we add it),
> etc. etc. So we want to make sure we have a matrix of tests covering a
> number of deployment options. Those tests do not exist yet, and they will
> have to be written. In principle - they can be moved to the "Helm"
> repository. That's where they conceptually belong. However - there is a
> Huge value in running the tests in airflow "development" - the value is
> that no-one will be able to break the "development" airflow, because those
> tests are run with every PR. I think we have no choice but to run those
> tests always in development. Otherwise, people maintaining the helm chart
> will have to fix the problems introduced by people changing Airflow code. I
> think this is a pretty bad idea to allow that. So if we move those tests to
> Helm Chart repo we have to figure out how to run those "kubernetes" tests
> in CI for every build. This is quite possible - by getting the latest
> master from helm chart and running the build, but it has several problems:
>
> 1) The test code for CI will have to continue to stay in Airflow (to run
> CI builds) - this means that we already have coupling and some code related
> to the execution of the helm tests has to be any way in Airflow.
>
> 2) Bigger problem. What happens if as "Airflow developer" you DO introduce
> a change that breaks the helm chart? You will see a CI error and..... You
> will not know what to do. Do you involve people who maintain the helm chart
> and wait for them? I think not. You should be able to reproduce the problem
> locally and fix it yourself (maybe with the help of others - but you should
> be able to fix your own commit). We would have to teach people how to bring
> the docker image and helm chart code from the latest version and run the
> tests. We could do it automatically with Breeze (similarly as we do with
> other integrations - where we bring in Kerberos, Mongo, and a multitude of
> others) without them even knowing it, but this might be fairly complex and
> prone to errors. In Monorepo - we already have a simple way of reproducing
> and running the tests locally and everything is in one place.
>
> 3) There is a chance that someone makes a change in Helm in parallel to a
> change in Airflow that breaks it. This could easily happen in the "git-sync
> case" or when we add "MySQL" for example in the future. And there is no way
> to prevent it.
>
> 4) If we only test against "released" Helm and Airflow (that was one of
> the suggestions), the problem is even bigger. How do you know that you do
> not break the currently "developed" helm chart? Or how do you know that the
> currently "developed" helm chart works with latest Airflow release? If you
> do not do those checks at the "commit" time, then you defer this to
> "release time" and only then you might find out that decisions you made
> during development have to be reverted. This is a very, very bad idea IMHO
> again leading to the case that the release manager will have to fix
> problems introduced by others.
>
> J,
>
>
>
> On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
>> Monorepo FTW.
>>
>> Yes, it gets a little bit messier around release, but the approach of
>> automatically extracting out the commits (or parts of commits) to a
>> separate repo for releasing may be the solution to that problem
>>
>>
>> -ash
>>
>> On Jul 3 2020, at 7:51 pm, Kaxil Naik <kaxiln...@gmail.com> wrote:
>>
>> > I will take a look at the Kubernetes approach and get back to this
>> thread.
>> >
>> > We had a discussion with Daniel yesterday and we are both concerned
>> about
>> >> all the overhead for people like us who work on all three "entities"
>> >> at the
>> >> same time. Even just explaining how to work with Pull Requests and in
>> what
>> >> sequence those PRs would have to be opened and merged in case of
>> changes
>> >> that are spanning across several "entities" - was a challenge. I was
>> unable
>> >> to clearly explain the sequence and way of reviewing/merging the PRs
>> that
>> >> will have to be made if we have submodules. This is a bad sign as I was
>> >> using submodules in the past and know how it works but I was unable to
>> >> explain it clearly.
>> >
>> >
>> > We don't even need submodules tbh. We can just use Bash Script that
>> > pulls a
>> > pinned Helm Chart version.
>> > We only need Helm chart to run integration test for k8s (atleast for
>> now).
>> > We already use tons of Bash scripts.
>> >
>> > One of the important benefits of separation that changes in one
>> component
>> > should not need change in other component, atleast
>> > not immediately.
>> >
>> > Changes in Helm chart and Docker file should never need changes in
>> Airflow
>> > Changes in Airflow should only ever need a change in Dockerfile and Helm
>> > Chart after a new version is released.
>> >
>> > I just had a talk with Daniel too and still didn't find a good enough
>> > reason to have them in the same repo.
>> >
>> > I will definitely look at the Kubernetes approach (maybe it is better)
>> and
>> > get back to this thread. But as of now I don't see any major PROs
>> > for having them in the same repo.
>> >
>> > Regards,
>> > Kaxil
>> >
>> >
>> >
>> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <jarek.pot...@polidea.com>
>> > wrote:
>> >
>> >> I think Ry's point is an important one - I thought about writing a
>> longer
>> >> post but I looked at the Kubernetes structure and I really like it so
>> just
>> >> wanted to comment on this last one.
>> >>
>> >> Seems that it is simply one "authoritative" (or source of truth) repo
>> where
>> >> everything is developed in monorepo fashion but then there is a bot
>> >> that moves every commit related to subdirectories to those "split-out"
>> >> repos. There are never direct commits of people or PRs in the
>> "split-out"
>> >> repositories. This is very similar to my original proposal to have
>> >> dedicated repos used for releases - but with an automated way of
>> publishing
>> >> the commits to the "separated" repos at the moment, they are merged to
>> >> master in the main repo. I love it.
>> >>
>> >> I think it's really good and "pragmatic" solution. The code is
>> >> available in
>> >> separate repos, including the history of commits related to each
>> "entity"
>> >> (so only chart-related commits in chart repo). Issues for particular
>> >> "entities" are in those separate repos as well (something that Kaxil
>> >> mentioned). Users (not developers!) who are interested only in
>> Dockerfile
>> >> or Helm Chart have separate repos they can look at - with only relevant
>> >> changes and history of releases for that particular entity. They can
>> raise
>> >> issues there (and in GitHub, we can easily refer to those issues from
>> the
>> >> main "airflow" repo). All the discussion from "user issues" are kept
>> >> in the
>> >> relevant repositories. Still - comments about development changes (and
>> >> related issues) might still be kept in the main "airflow" repo - next
>> to
>> >> other "development" changes.
>> >>
>> >> We can run separate releases from those linked repositories and even
>> >> publish sources directly from those repositories rather than from the
>> main
>> >> one. At the same time - we avoid all the hassle of submodules.
>> >>
>> >> We had a discussion with Daniel yesterday and we are both concerned
>> about
>> >> all the overhead for people like us who work on all three "entities"
>> >> at the
>> >> same time. Even just explaining how to work with Pull Requests and in
>> what
>> >> sequence those PRs would have to be opened and merged in case of
>> changes
>> >> that are spanning across several "entities" - was a challenge. I was
>> unable
>> >> to clearly explain the sequence and way of reviewing/merging the PRs
>> that
>> >> will have to be made if we have submodules. This is a bad sign as I was
>> >> using submodules in the past and know how it works but I was unable to
>> >> explain it clearly.
>> >>
>> >> I really, really like Kubernetes approach - seems that it's one of the
>> >> cases where we can "eat cake and have it too".
>> >>
>> >> J.
>> >>
>> >>
>> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <r...@rywalker.com> wrote:
>> >>
>> >> > One reason to have a monorepo is for project branding, and end user
>> >> > experience. But for component development experience, it's nice to
>> >> have a
>> >> > small, dedicated repo.
>> >> >
>> >> > I think the git submodule approach is technically sound, but is at
>> odds
>> >> > with making the project easy to consume/understand from the end user
>> >> > perspective, especially if we expand the use of subprojects. And
>> >> the main
>> >> > Airflow commit graph would appear to be slowing down which is bad for
>> >> > Airflow brand perception.
>> >> >
>> >> > Kubernetes has many sub-repos that are integrated into the main
>> >> repo -
>> >> > which I think could be the best of both worlds:
>> >> > Example:
>> https://github.com/kubernetes/kubernetes/tree/master/staging
>> >> >
>> >> > I haven't dug in very deeply, and I won't pretend to understand how
>> >> > challenging it may be to maintain this structure, but I'd support
>> >> breaking
>> >> > more components out of the main Airflow repo for dev purposes (for
>> >> example,
>> >> > in the future, it'd be nice to have airflow-cli, airflow-api,
>> >> > airflow-scheduler, individual provider repos that are cleanly
>> separated)
>> >> as
>> >> > long as we bring the commits/contributions back into the monorepo
>> with
>> >> > automation.
>> >> >
>> >> > Maybe we could dive a little deeper into how K8s is operating, before
>> >> going
>> >> > with submodules?
>> >> >
>> >> > -Ry
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <kaxiln...@gmail.com>
>> wrote:
>> >> >
>> >> > > Let's come to a consensus first before we do anything :-)
>> >> > >
>> >> > > Is everyone happy with separate repo approach? Let's wait for 72
>> hours
>> >> to
>> >> > > hear from all and then have a plan on how we do it? WDYT?
>> >> > >
>> >> > > But indeed git submodules approach sounds good. We do it for for
>> >> *Airflow
>> >> > > Site *(
>> >> > >
>> >> > >
>> >> >
>> >>
>> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
>> >> > > )
>> >> > > too.
>> >> > >
>> >> > > Regards,
>> >> > > Kaxil
>> >> > >
>> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
>> jarek.pot...@polidea.com>
>> >> > > wrote:
>> >> > >
>> >> > > > Absolutely - I am happy to add "best practices" and short
>> >> "howto do
>> >> > stuff
>> >> > > > with git submodules"  - and this knowledge will only be needed
>> for
>> >> > > > interacting with prod image/helmchart/running kubernetes tests.
>> For
>> >> all
>> >> > > the
>> >> > > > other purposes it should be "business as usual".
>> >> > > >
>> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
>> >> > > daniel.imber...@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > I think git submodules sounds like a great idea. We would
>> >> need to
>> >> > write
>> >> > > > > this into the CONTRIBUTING.md to let people know how to do it
>> but
>> >> > It’s
>> >> > > a
>> >> > > > > “teach once” situation.
>> >> > > > >
>> >> > > > > via Newton Mail [
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> >> > > > > ]
>> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
>> >> > turbas...@apache.org>
>> >> > > > > wrote:
>> >> > > > > I support the idea of separate repos. The git submodules
>> mentioned
>> >> by
>> >> > > > > Jarek sounds like an interesting solution. It may add some
>> >> complexity
>> >> > > > > for new contributors but it's not rocket science. If we agree
>> on
>> >> > using
>> >> > > > > this we should add small how-to in contributing.rst I think
>> (i.e.
>> >> do
>> >> > I
>> >> > > > > have to have fork of each repo?).
>> >> > > > >
>> >> > > > > As stressed previously if we go this route we should make
>> >> sure we
>> >> > have
>> >> > > > > nice testing of all those three components. Regarding the
>> >> versioning,
>> >> > > > > I have no strong opinion but I fully support using separate
>> issues
>> >> > for
>> >> > > > > airflow, docker, and helm.
>> >> > > > >
>> >> > > > > Tomek
>> >> > > > >
>> >> > > > >
>> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
>> >> > jarek.pot...@polidea.com>
>> >> > > > > wrote:
>> >> > > > > >
>> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
>> >> > > > > daniel.imber...@gmail.com>
>> >> > > > > > wrote:
>> >> > > > > >
>> >> > > > > > I’m fine with keeping it as three separate repos but merging
>> >> > testing
>> >> > > > > > > somehow (e.g. the source code chart would pull the
>> helm/docker
>> >> > > chart
>> >> > > > > into
>> >> > > > > > > .build) but we need to do it in a way that doesn’t make
>> testing
>> >> > too
>> >> > > > > > > difficult.
>> >> > > > > > >
>> >> > > > > > > So for example: How do I test/integration test a change
>> that
>> >> > > > involves a
>> >> > > > > > > change to all three and has to be done at the same time?
>> >> Perhaps
>> >> > a
>> >> > > > > user can
>> >> > > > > > > “register” a branch of helm and docker when they start up
>> >> breeze?
>> >> > > Or
>> >> > > > > > > perhaps we create a “parent” integration test that uses the
>> >> three
>> >> > > > > together?
>> >> > > > > > >
>> >> > > > > >
>> >> > > > > > Yes, those are exactly my concerns when splitting the repos.
>> >> > > > > >
>> >> > > > > > I think testing for development should remain in the
>> "airflow"
>> >> > repo.
>> >> > > It
>> >> > > > > is
>> >> > > > > > the "central one" in fact. I slept it over and I think using
>> >> > > "released"
>> >> > > > > > versions for development testing will suffer from this "we
>> >> need a
>> >> > > > change
>> >> > > > > in
>> >> > > > > > all three of those".
>> >> > > > > >
>> >> > > > > > But we have an easy solution I think.
>> >> > > > > >
>> >> > > > > > I think that simply setting submodules properly should do
>> >> to the
>> >> > job:
>> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules. They
>> seem
>> >> to
>> >> > be
>> >> > > > > > perfect for our case.
>> >> > > > > >
>> >> > > > > > For those who have not used it - in short - submodules work
>> in
>> >> the
>> >> > > way
>> >> > > > > that
>> >> > > > > > they register the "linked repos" and store related "hash"
>> >> of the
>> >> > > commit
>> >> > > > > > from that linked repo. For example, the "chart" folder will
>> >> be a
>> >> > link
>> >> > > > to
>> >> > > > > > "apache/airflow-helm-chart". We can also move the prod
>> Dockerfile
>> >> > to
>> >> > > a
>> >> > > > > > subfolder and link it to the separate repo. Git submodule
>> >> has a
>> >> > > > > > built-in mechanism to a) update to the latest version of the
>> >> repo,
>> >> > b)
>> >> > > > > > commit your changes to the linked repo from there which is
>> >> all we
>> >> > > > need. I
>> >> > > > > > used those few times - I never liked submodules for sharing
>> >> > "library"
>> >> > > > > code,
>> >> > > > > > but for sharing helm/Docker It seems perfect.
>> >> > > > > >
>> >> > > > > > From the "regular" developer point of view - you do not
>> >> need to
>> >> > > > > get/update
>> >> > > > > > submodules if you do not need to use them - so for all the
>> >> > > development
>> >> > > > > > purposes if you only change the "airflow" code, you would not
>> >> even
>> >> > > need
>> >> > > > > to
>> >> > > > > > sync chart or Dockerfile. You do "git checkout" as usual
>> >> and it
>> >> > > should
>> >> > > > > > work. So basically - no change for "regular" airflow
>> development.
>> >> > > > > >
>> >> > > > > > However, if you do need to work on helm + Docker + code,
>> >> then you
>> >> > > > simply
>> >> > > > > to
>> >> > > > > > "git submodule update", go to the linked "helm" or "docker"
>> >> folder,
>> >> > > > > > checkout the "master" version and you start making changes.
>> The
>> >> > only
>> >> > > > > thing
>> >> > > > > > to remember when you want to push your changes is to do
>> >> `git push
>> >> > > > > > --recurse-sumbodules="check" ` and it will make sure that
>> >> all the
>> >> > > repos
>> >> > > > > are
>> >> > > > > > updated, It is a bit involved, but latest git version have
>> >> a very
>> >> > > good
>> >> > > > > > support and it must only be used by people who work on
>> >> airflow +
>> >> > > > docker +
>> >> > > > > > helm - all the others are unaffected.
>> >> > > > > >
>> >> > > > > > From the CI perspective also nothing changes - when we
>> checkout
>> >> the
>> >> > > > code
>> >> > > > > we
>> >> > > > > > will include submodules and our test harness will be largely
>> >> > > unchanged.
>> >> > > > > > Submodule provides us with the right mechanism for cross
>> >> dependency
>> >> > > > even
>> >> > > > > if
>> >> > > > > > we use branches.
>> >> > > > > >
>> >> > > > > > If everyone will be ok with that - I am happy to set it up,
>> With
>> >> > > > > submodules
>> >> > > > > > - we can switch to separate repos even without releasing
>> >> helm and
>> >> > > Prod
>> >> > > > > > chart "officially".
>> >> > > > > >
>> >> > > > > > J.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > >
>> >> > > > > > > via Newton Mail [
>> >> > > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> >> > > > > > > ]
>> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
>> >> > > > jarek.pot...@polidea.com
>> >> > > > > >
>> >> > > > > > > wrote:
>> >> > > > > > > Sure. We can work with such an approach. There will be some
>> >> > > > > dependencies
>> >> > > > > > > that we might find are problematic, but If we all see
>> >> that it's
>> >> > > > > > > worth trying, there is a clear benefit that it makes for a
>> >> > "clean"
>> >> > > > > > > split between those different "entities". And possibly
>> >> once we
>> >> > > > release
>> >> > > > > > > first versions of both image and chart, such problems
>> >> will be
>> >> > rare
>> >> > > > and
>> >> > > > > easy
>> >> > > > > > > to fix.
>> >> > > > > > >
>> >> > > > > > > I personally think such split is inevitable eventually,
>> it's
>> >> > just a
>> >> > > > > matter
>> >> > > > > > > when to do it. If we decide to make this happen soon - I am
>> >> more
>> >> > > than
>> >> > > > > happy
>> >> > > > > > > to work on making the split reality.
>> >> > > > > > >
>> >> > > > > > > One prerequisite to that is that all those - Helm Chart,
>> Prod
>> >> > Image
>> >> > > > and
>> >> > > > > > > Airflow are released in stable versions separately
>> >> "officially" -
>> >> > > > from
>> >> > > > > the
>> >> > > > > > > current sources (otherwise there will be no way to test
>> >> > > cross-repo).
>> >> > > > > > >
>> >> > > > > > > I think for that we will need to agree on the versioning
>> scheme
>> >> > and
>> >> > > > > cadence
>> >> > > > > > > for the Image and Helm Chart, then copy sources from
>> airflow
>> >> and
>> >> > > > > release
>> >> > > > > > > them as "baseline" including setup the tests for all of
>> >> those -
>> >> > > then
>> >> > > > we
>> >> > > > > > > can remove both Helm and Dockerfile from the airflow repo.
>> >> Happy
>> >> > to
>> >> > > > > help
>> >> > > > > > > with that if that's the direction we choose as a
>> >> community. It
>> >> is
>> >> > > > > important
>> >> > > > > > > though that we keep the cross-repo testing working. We
>> >> have it
>> >> > > > working
>> >> > > > > as
>> >> > > > > > > of yesterday, so now the matter is - whatever we do we
>> >> keep it
>> >> > > > running
>> >> > > > > and
>> >> > > > > > > have development environment support easy development and
>> >> testing
>> >> > > of
>> >> > > > > > > either of the three (including CI testing cross-repos) ,
>> That's
>> >> > the
>> >> > > > > only
>> >> > > > > > > really important thing to me - the rest is more of
>> technicality
>> >> > how
>> >> > > > we
>> >> > > > > link
>> >> > > > > > > the repos, but principle remains.
>> >> > > > > > >
>> >> > > > > > > Do we have an idea for the versioning scheme that we
>> >> would like
>> >> > to
>> >> > > > use
>> >> > > > > for
>> >> > > > > > > the Helm Chart and prod image ?
>> >> > > > > > >
>> >> > > > > > > Should we make it CalVer
>> >> <https://calver.org/overview.html> or
>> >> > > > SemVer
>> >> > > > > > > <https://semver.org/> (or some other scheme)? And how
>> should
>> >> we
>> >> > > > treat
>> >> > > > > the
>> >> > > > > > > combinations with Airflow?
>> >> > > > > > >
>> >> > > > > > > My thoughts (but I have no strong opinions as long as
>> someone
>> >> > > > proposes
>> >> > > > > more
>> >> > > > > > > sensible versioning schemes):
>> >> > > > > > >
>> >> > > > > > > 1) Airflow code - we continue the release scheme we have
>> (with
>> >> > > > > deciding on
>> >> > > > > > > 2.* scheme for the release). I expect in the future we
>> might
>> >> > decide
>> >> > > > on
>> >> > > > > > > doing branches or patches so for 2.* I'd opt for going full
>> >> > SemVer
>> >> > > > > approach
>> >> > > > > > > and patches released from branches.
>> >> > > > > > >
>> >> > > > > > > 2) I believe that Helm Chart can be versioned with its own
>> >> > version
>> >> > > > > (then
>> >> > > > > > > you specify the image version as helm parameter). For the
>> Helm
>> >> > > Chart
>> >> > > > I
>> >> > > > > > > think CalVer might be OK as I do not expect any
>> >> branching/patches
>> >> > > in
>> >> > > > > the
>> >> > > > > > > future - I'd expect that there will be a single stream of
>> >> > releases.
>> >> > > > > > >
>> >> > > > > > > 3) Dockerfile (+ related files such as .dockerignore, empty
>> >> dir,
>> >> > > > > > > entrypoints etc). i do not imagine a lot of branching for
>> >> those -
>> >> > > we
>> >> > > > > > > should be able to release a new version of a Dockerfile (+
>> >> > related
>> >> > > > > files)
>> >> > > > > > > working with nearly any earlier Airflow release, so CalVer
>> >> seems
>> >> > > > like a
>> >> > > > > > > good choice.
>> >> > > > > > >
>> >> > > > > > > 4) Image versioning becomes a bit most complex because the
>> >> image
>> >> > > tag
>> >> > > > is
>> >> > > > > > > always combination of:
>> >> > > > > > > * Dockerfile (+ related files) version
>> >> > > > > > > * Airflow Version
>> >> > > > > > > * Python Version
>> >> > > > > > >
>> >> > > > > > > An example versioning I can imagine:
>> >> > > > > > >
>> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 - patch
>> level
>> >> > (if
>> >> > > we
>> >> > > > > > > decide to have patches).
>> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... -> depending
>> >> when we
>> >> > > > release
>> >> > > > > > > them
>> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm Chart
>> >> has a
>> >> > > > > minimum
>> >> > > > > > > version of both Dockerfile and Airflow versions it works
>> with.
>> >> > > > > > >
>> >> > > > > > > *Example Docker Image tags:*
>> >> > > > > > >
>> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
>> >> > > > > > >
>> >> > > > > > > WDYT?
>> >> > > > > > >
>> >> > > > > > > J,
>> >> > > > > > >
>> >> > > > > > >
>> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
>> >> kaxiln...@gmail.com>
>> >> > > > > wrote:
>> >> > > > > > >
>> >> > > > > > > > I think we should have "separate repos for development"
>> too.
>> >> > > > > > > >
>> >> > > > > > > > 3 Repos in total:
>> >> > > > > > > >
>> >> > > > > > > > 1) apache/airflow
>> >> > > > > > > > 2) apache/airflow-docker-image
>> >> > > > > > > > 3) apache/airflow-helm-chart
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > (1) *apache/airflow* should use a pinned stable version
>> of
>> >> > > Airflow
>> >> > > > > Helm
>> >> > > > > > > > chart to run Kubernetes tests
>> >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci* file
>> which
>> >> it
>> >> > > can
>> >> > > > > use to
>> >> > > > > > > > run airflow tests on docker images.
>> >> > > > > > > > (3) *apache/airflow-docker-image *should use the latest
>> >> > available
>> >> > > > > stable
>> >> > > > > > > > version of airflow
>> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the latest
>> >> available
>> >> > > > > stable
>> >> > > > > > > > version of airflow
>> >> > > > > > > >
>> >> > > > > > > > Having such split also makes some updates more
>> >> difficult -
>> >> for
>> >> > > > > example if
>> >> > > > > > > > > we add new "extra" to Airflow that will require to
>> install
>> >> > > "apt"
>> >> > > > > > > > dependency
>> >> > > > > > > > > in Dockerfile, we will have to split it into first
>> adding
>> >> the
>> >> > > > > > > dependency
>> >> > > > > > > > to
>> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
>> >> extra to
>> >> > > > airflow
>> >> > > > > with
>> >> > > > > > > > > setup.py.
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > Adding a new extra to setup.py would not (and should not)
>> >> > impact
>> >> > > > the
>> >> > > > > > > > development of *apache/airflow-docker-image*
>> >> > > > > > > > Once an RC is cut for apache/airflow or after a new
>> version
>> >> is
>> >> > > > > released
>> >> > > > > > > for
>> >> > > > > > > > apache/airflow, we can work on supporting the new airflow
>> >> > version
>> >> > > > in
>> >> > > > > the
>> >> > > > > > > > Production Docker Image.
>> >> > > > > > > > While doing that we can add all the libraries that are
>> needed
>> >> > by
>> >> > > > the
>> >> > > > > new
>> >> > > > > > > > Airflow Version and we will have a clean commit history
>> and
>> >> > > > > changelog for
>> >> > > > > > > > Docker image.
>> >> > > > > > > >
>> >> > > > > > > > We definitely do not need to work parallelly on both the
>> >> repos.
>> >> > > By
>> >> > > > > doing
>> >> > > > > > > > development in a separate repo we keep consistent
>> "source"
>> >> > files
>> >> > > > and
>> >> > > > > we
>> >> > > > > > > can
>> >> > > > > > > > release each artifact with a
>> >> > > > > > > > separate cadence. If someone discovers bug in newly
>> released
>> >> > > > > Dockerimage,
>> >> > > > > > > > we should be easily able to cut out a new release with
>> the
>> >> > patch
>> >> > > > > without
>> >> > > > > > > > worrying about how development is
>> >> > > > > > > > going in the apache/airflow repo.
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the similar
>> >> manner:
>> >> > > > > > > >
>> >> > > > > > > > https://github.com/apache/flink &
>> >> > > > > https://github.com/apache/flink-docker
>> >> > > > > > > > https://github.com/apache/couchdb &
>> >> > > > > > > > https://github.com/apache/couchdb-docker
>> >> > > > > > > >
>> >> > > > > > > > Regards,
>> >> > > > > > > > Kaxil
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
>> >> > > > > jarek.pot...@polidea.com>
>> >> > > > > > > > wrote:
>> >> > > > > > > >
>> >> > > > > > > > > I do not think it's only the question of Mono/Multi
>> repos.
>> >> > > While
>> >> > > > I
>> >> > > > > > > > clearly
>> >> > > > > > > > > see the benefit of separate repos I also see some
>> >> drawbacks.
>> >> > > > > > > > >
>> >> > > > > > > > > And if it bothers others, I am happy to follow the
>> >> majority.
>> >> > If
>> >> > > > we
>> >> > > > > > > think
>> >> > > > > > > > > that a bit more complexity in testing justifies
>> separating
>> >> > > those
>> >> > > > > three
>> >> > > > > > > > > completely and having more "clean"- it's also
>> >> workable but
>> >> > IMHO
>> >> > > > > > > > introduces
>> >> > > > > > > > > certain complexity in development.
>> >> > > > > > > > >
>> >> > > > > > > > > However I think this is not 0/1 a kind of Hybrid
>> approach
>> >> in
>> >> > my
>> >> > > > > opinion
>> >> > > > > > > > > might be best of both worlds - development and
>> >> releases .
>> >> > > > > > > > >
>> >> > > > > > > > > Let me explain what I mean by "Hybrid":
>> >> > > > > > > > >
>> >> > > > > > > > > I think we definitely should have separate
>> >> repositories to
>> >> > > > release
>> >> > > > > > > those
>> >> > > > > > > > > artifacts and I think there is no doubt about it:
>> >> > > > > > > > >
>> >> > > > > > > > > * airflow (apache/airflow)
>> >> > > > > > > > > * prod docker image (apache/airflow-docker)
>> >> > > > > > > > > * helm chart (apache/airflow-helm)
>> >> > > > > > > > > * api clients (we already have separate repos for
>> those)
>> >> > > > > > > > > (apache/airflow-client-*)
>> >> > > > > > > > >
>> >> > > > > > > > > I think the only question is where we develop all those
>> >> > > (develop
>> >> > > > !=
>> >> > > > > > > > > release). There are certain benefits of having a single
>> >> > > "master"
>> >> > > > > (let's
>> >> > > > > > > > > call it "development" further) for all those artifacts.
>> >> > > Currently
>> >> > > > > the
>> >> > > > > > > > > "development" version for all of those is in one repo
>> >> - and
>> >> > > while
>> >> > > > > > > > > developing one depends on the other, we also test all
>> of
>> >> > those
>> >> > > > > together
>> >> > > > > > > > and
>> >> > > > > > > > > this means that "current best" set of airflow sources
>> >> > > (including
>> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm chart
>> work.
>> >> > This
>> >> > > > > means
>> >> > > > > > > for
>> >> > > > > > > > > example that you will not be able to break the Helm
>> Chart
>> >> by
>> >> > > > > changing
>> >> > > > > > > > > anything that the helm chart depends on in airflow. For
>> >> > example
>> >> > > > if
>> >> > > > > you
>> >> > > > > > > > > change "airflow webserver" into "airflow server" the
>> >> current
>> >> > > helm
>> >> > > > > chart
>> >> > > > > > > > > will break. Similarly if you change entrypoint,sh in
>> Docker
>> >> > > image
>> >> > > > > in a
>> >> > > > > > > > way
>> >> > > > > > > > > that is not compatible with Helm chart, we will not let
>> >> that
>> >> > > > > happen -
>> >> > > > > > > the
>> >> > > > > > > > > CI tests will break if either of those changes in an
>> >> > > incompatible
>> >> > > > > way.
>> >> > > > > > > > And
>> >> > > > > > > > > we can have dependencies in any direction between those
>> >> > three.
>> >> > > > > When we
>> >> > > > > > > > see
>> >> > > > > > > > > a commit break either of the three - we can make a
>> decision
>> >> > > about
>> >> > > > > what
>> >> > > > > > > to
>> >> > > > > > > > > do - either accept and document the incompatibility
>> >> or fix
>> >> > it.
>> >> > > > > > > > >
>> >> > > > > > > > > Of course keeping that property (testing it all
>> together)
>> >> is
>> >> > > also
>> >> > > > > > > > possible
>> >> > > > > > > > > if they are in completely separate repos. There are
>> several
>> >> > > > > > > > > cross-dependencies - Docker image building depends on
>> >> > > > dependencies
>> >> > > > > in
>> >> > > > > > > > > setup.py for example, you cannot build Docker image
>> from
>> >> only
>> >> > > > > > > Dockerfile
>> >> > > > > > > > > without the sources of airflow nor build and test helm
>> >> charts
>> >> > > > > without
>> >> > > > > > > the
>> >> > > > > > > > > image (and sources - because that's where the current
>> >> > > kubernetes
>> >> > > > > tests
>> >> > > > > > > > > are). If we want to continue doing it for both Helm and
>> >> > > > > Dockerfile, we
>> >> > > > > > > > > would have to basically check out the latest sources of
>> >> > Airflow
>> >> > > > > and run
>> >> > > > > > > > the
>> >> > > > > > > > > CI tests before merging any Docker or Helm Chart
>> changes
>> >> and
>> >> > > the
>> >> > > > > > > > opposite -
>> >> > > > > > > > > we will have to download Dockerfile/Helm chart and
>> build
>> >> > > > > image/install
>> >> > > > > > > > Helm
>> >> > > > > > > > > chart when we are running CI tests for Airflow. This is
>> >> > > possible
>> >> > > > > and we
>> >> > > > > > > > > could do it, but it adds complexity to the build/CI
>> >> process.
>> >> > > > > > > > >
>> >> > > > > > > > > Having such split also makes some updates more
>> >> difficult -
>> >> > for
>> >> > > > > example
>> >> > > > > > > if
>> >> > > > > > > > > we add new "extra" to Airflow that will require to
>> install
>> >> > > "apt"
>> >> > > > > > > > dependency
>> >> > > > > > > > > in Dockerfile, we will have to split it into first
>> adding
>> >> the
>> >> > > > > > > dependency
>> >> > > > > > > > to
>> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
>> >> extra to
>> >> > > > airflow
>> >> > > > > with
>> >> > > > > > > > > setup.py. This makes it quite difficult to test it
>> together
>> >> > > > though
>> >> > > > > (the
>> >> > > > > > > > > Dockerfile change can only be tested fully after
>> >> merging it
>> >> > to
>> >> > > > > master).
>> >> > > > > > > > Not
>> >> > > > > > > > > mentioning complexity of managing different versions
>> >> - your
>> >> > > local
>> >> > > > > > > > > development Dockerfile version vs sources of Airflow
>> for
>> >> > > example.
>> >> > > > > > > Imagine
>> >> > > > > > > > > switching between branches where you add two
>> >> different apt
>> >> > > > > dependencies
>> >> > > > > > > > to
>> >> > > > > > > > > the Dockerfile. There are more similar scenarios I can
>> >> > imagine
>> >> > > -
>> >> > > > > > > > especially
>> >> > > > > > > > > for parallel changes in those repos.
>> >> > > > > > > > >
>> >> > > > > > > > > This is of course doable to keep them separate, but
>> >> it is
>> >> > > quite a
>> >> > > > > bit
>> >> > > > > > > > more
>> >> > > > > > > > > complex to set up (especially for a consistent
>> development
>> >> > > > > environment)
>> >> > > > > > > > > when you have separate repos and prevent cross-breaking
>> >> > changes
>> >> > > > > might
>> >> > > > > > > be
>> >> > > > > > > > > more difficult.
>> >> > > > > > > > >
>> >> > > > > > > > > I believe that the best way is to continue developing
>> >> > airflow +
>> >> > > > > image +
>> >> > > > > > > > > chart in one repo - airflow, but release them from
>> those
>> >> > > separate
>> >> > > > > > > repos.
>> >> > > > > > > > >
>> >> > > > > > > > > Airflow source release does not have to contain neither
>> >> > chart,
>> >> > > > nor
>> >> > > > > > > image.
>> >> > > > > > > > > And even if it contains sources for those, they are
>> >> not the
>> >> > > final
>> >> > > > > > > > > "artifacts" (installable image and installable helm
>> chart).
>> >> > > > > > > > > Whenever we decide to release either of them - we
>> >> test it
>> >> in
>> >> > > > > > > > "development".
>> >> > > > > > > > > Then only when it is tested, we copy the sources to
>> those
>> >> > > > separate
>> >> > > > > > > repos
>> >> > > > > > > > > and release them.
>> >> > > > > > > > >
>> >> > > > > > > > > With git - we can even do it very easily while
>> preserving
>> >> > > history
>> >> > > > > of
>> >> > > > > > > > > commits easily (been there, done that). And then we
>> could
>> >> > > release
>> >> > > > > Helm
>> >> > > > > > > > and
>> >> > > > > > > > > Docker image separately based on the commits and tags
>> in
>> >> > those
>> >> > > > > separate
>> >> > > > > > > > > repositories.
>> >> > > > > > > > >
>> >> > > > > > > > > I agree that separate repos is a more "clean" approach.
>> >> But I
>> >> > > > > think it
>> >> > > > > > > is
>> >> > > > > > > > > less convenient for development consistency.
>> >> > > > > > > > >
>> >> > > > > > > > > J,
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
>> >> > kaxiln...@gmail.com
>> >> > > >
>> >> > > > > wrote:
>> >> > > > > > > > >
>> >> > > > > > > > > > Forgot to mention, having them in separate repo also
>> >> helps
>> >> > in
>> >> > > > > better
>> >> > > > > > > > > > managing each individual artifacts.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Each repo would have a separate Github Issue where
>> >> we can
>> >> > > track
>> >> > > > > the
>> >> > > > > > > > issue
>> >> > > > > > > > > > specific to Helm chart or Dockerfile.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Regards,
>> >> > > > > > > > > > Kaxil
>> >> > > > > > > > > >
>> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
>> >> > > kaxiln...@gmail.com
>> >> > > > >
>> >> > > > > > > wrote:
>> >> > > > > > > > > >
>> >> > > > > > > > > > > The PMC also needs to agree if we want separate
>> VOTING
>> >> > for
>> >> > > > > Docker
>> >> > > > > > > > Image
>> >> > > > > > > > > > > and Helm chart, I think we do.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Regards,
>> >> > > > > > > > > > > Kaxil
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
>> >> > > > kaxiln...@gmail.com
>> >> > > > > >
>> >> > > > > > > > wrote:
>> >> > > > > > > > > > >
>> >> > > > > > > > > > >> Hi all,
>> >> > > > > > > > > > >>
>> >> > > > > > > > > > >> What do you all think about having Dockerfile
>> >> and Helm
>> >> > > chart
>> >> > > > > in
>> >> > > > > > > the
>> >> > > > > > > > > same
>> >> > > > > > > > > > >> "Airflow" Repo vs separate?
>> >> > > > > > > > > > >>
>> >> > > > > > > > > > >> I feel having a separate repo for Airflow
>> Dockerfile
>> >> and
>> >> > > > Helm
>> >> > > > > > > chart
>> >> > > > > > > > > have
>> >> > > > > > > > > > >> more benefits like easy to track changes (via
>> >> > Changelog),
>> >> > > > > easy for
>> >> > > > > > > > new
>> >> > > > > > > > > > >> contributors, separate release cadence.
>> >> > > > > > > > > > >>
>> >> > > > > > > > > > >> Currently, docker file and Helm Chart are inside
>> the
>> >> > same
>> >> > > > > repo and
>> >> > > > > > > > > when
>> >> > > > > > > > > > >> we release changelog for a new Airflow version, it
>> >> would
>> >> > > > > include
>> >> > > > > > > all
>> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart)
>> >> which I
>> >> > think
>> >> > > is
>> >> > > > > not
>> >> > > > > > > > that
>> >> > > > > > > > > > great.
>> >> > > > > > > > > > >>
>> >> > > > > > > > > > >> Also having them all inside a single repo means
>> >> changes
>> >> > in
>> >> > > > > Helm
>> >> > > > > > > > Chart
>> >> > > > > > > > > > and
>> >> > > > > > > > > > >> Dockerfile can block Airflow release. We could use
>> >> > stable
>> >> > > > Helm
>> >> > > > > > > Chart
>> >> > > > > > > > > > >> version and Dockerfile version to test Airflow
>> >> so that
>> >> > > they
>> >> > > > > are
>> >> > > > > > > > > > blockers to
>> >> > > > > > > > > > >> release too.
>> >> > > > > > > > > > >>
>> >> > > > > > > > > > >> Happy to hear the thoughts from the community.
>> >> > > > > > > > > > >>
>> >> > > > > > > > > > >> Regards,
>> >> > > > > > > > > > >> Kaxil
>> >> > > > > > > > > > >>
>> >> > > > > > > > > > >
>> >> > > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > --
>> >> > > > > > > > >
>> >> > > > > > > > > Jarek Potiuk
>> >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal
>> Software
>> >> > > Engineer
>> >> > > > > > > > >
>> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
>> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > > >
>> >> > > > > > > --
>> >> > > > > > >
>> >> > > > > > > Jarek Potiuk
>> >> > > > > > > Polidea <https://www.polidea.com/> | Principal Software
>> >> Engineer
>> >> > > > > > >
>> >> > > > > > > M: +48 660 796 129 <+48660796129>
>> >> > > > > > > [image: Polidea] <https://www.polidea.com/>
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > --
>> >> > > > > >
>> >> > > > > > Jarek Potiuk
>> >> > > > > > Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> >> > > > > >
>> >> > > > > > M: +48 660 796 129 <+48660796129>
>> >> > > > > > [image: Polidea] <https://www.polidea.com/>
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > >
>> >> > > > Jarek Potiuk
>> >> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >> > > >
>> >> > > > M: +48 660 796 129 <+48660796129>
>> >> > > > [image: Polidea] <https://www.polidea.com/>
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >>
>> >> --
>> >>
>> >> Jarek Potiuk
>> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>
>> >> M: +48 660 796 129 <+48660796129>
>> >> [image: Polidea] <https://www.polidea.com/>
>> >>
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to