Monorepo FTW.  

Yes, it gets a little bit messier around release, but the approach of
automatically extracting out the commits (or parts of commits) to a
separate repo for releasing may be the solution to that problem


-ash

On Jul 3 2020, at 7:51 pm, Kaxil Naik <kaxiln...@gmail.com> wrote:

> I will take a look at the Kubernetes approach and get back to this thread.
>  
> We had a discussion with Daniel yesterday and we are both concerned about
>> all the overhead for people like us who work on all three "entities"
>> at the
>> same time. Even just explaining how to work with Pull Requests and in what
>> sequence those PRs would have to be opened and merged in case of changes
>> that are spanning across several "entities" - was a challenge. I was unable
>> to clearly explain the sequence and way of reviewing/merging the PRs that
>> will have to be made if we have submodules. This is a bad sign as I was
>> using submodules in the past and know how it works but I was unable to
>> explain it clearly.
>  
>  
> We don't even need submodules tbh. We can just use Bash Script that
> pulls a
> pinned Helm Chart version.
> We only need Helm chart to run integration test for k8s (atleast for now).
> We already use tons of Bash scripts.
>  
> One of the important benefits of separation that changes in one component
> should not need change in other component, atleast
> not immediately.
>  
> Changes in Helm chart and Docker file should never need changes in Airflow
> Changes in Airflow should only ever need a change in Dockerfile and Helm
> Chart after a new version is released.
>  
> I just had a talk with Daniel too and still didn't find a good enough
> reason to have them in the same repo.
>  
> I will definitely look at the Kubernetes approach (maybe it is better) and
> get back to this thread. But as of now I don't see any major PROs
> for having them in the same repo.
>  
> Regards,
> Kaxil
>  
>  
>  
> On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>  
>> I think Ry's point is an important one - I thought about writing a longer
>> post but I looked at the Kubernetes structure and I really like it so just
>> wanted to comment on this last one.
>>  
>> Seems that it is simply one "authoritative" (or source of truth) repo where
>> everything is developed in monorepo fashion but then there is a bot
>> that moves every commit related to subdirectories to those "split-out"
>> repos. There are never direct commits of people or PRs in the "split-out"
>> repositories. This is very similar to my original proposal to have
>> dedicated repos used for releases - but with an automated way of publishing
>> the commits to the "separated" repos at the moment, they are merged to
>> master in the main repo. I love it.
>>  
>> I think it's really good and "pragmatic" solution. The code is
>> available in
>> separate repos, including the history of commits related to each "entity"
>> (so only chart-related commits in chart repo). Issues for particular
>> "entities" are in those separate repos as well (something that Kaxil
>> mentioned). Users (not developers!) who are interested only in Dockerfile
>> or Helm Chart have separate repos they can look at - with only relevant
>> changes and history of releases for that particular entity. They can raise
>> issues there (and in GitHub, we can easily refer to those issues from the
>> main "airflow" repo). All the discussion from "user issues" are kept
>> in the
>> relevant repositories. Still - comments about development changes (and
>> related issues) might still be kept in the main "airflow" repo - next to
>> other "development" changes.
>>  
>> We can run separate releases from those linked repositories and even
>> publish sources directly from those repositories rather than from the main
>> one. At the same time - we avoid all the hassle of submodules.
>>  
>> We had a discussion with Daniel yesterday and we are both concerned about
>> all the overhead for people like us who work on all three "entities"
>> at the
>> same time. Even just explaining how to work with Pull Requests and in what
>> sequence those PRs would have to be opened and merged in case of changes
>> that are spanning across several "entities" - was a challenge. I was unable
>> to clearly explain the sequence and way of reviewing/merging the PRs that
>> will have to be made if we have submodules. This is a bad sign as I was
>> using submodules in the past and know how it works but I was unable to
>> explain it clearly.
>>  
>> I really, really like Kubernetes approach - seems that it's one of the
>> cases where we can "eat cake and have it too".
>>  
>> J.
>>  
>>  
>> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <r...@rywalker.com> wrote:
>>  
>> > One reason to have a monorepo is for project branding, and end user
>> > experience. But for component development experience, it's nice to
>> have a
>> > small, dedicated repo.
>> >
>> > I think the git submodule approach is technically sound, but is at odds
>> > with making the project easy to consume/understand from the end user
>> > perspective, especially if we expand the use of subprojects. And
>> the main
>> > Airflow commit graph would appear to be slowing down which is bad for
>> > Airflow brand perception.
>> >
>> > Kubernetes has many sub-repos that are integrated into the main
>> repo -
>> > which I think could be the best of both worlds:
>> > Example: https://github.com/kubernetes/kubernetes/tree/master/staging
>> >
>> > I haven't dug in very deeply, and I won't pretend to understand how
>> > challenging it may be to maintain this structure, but I'd support
>> breaking
>> > more components out of the main Airflow repo for dev purposes (for
>> example,
>> > in the future, it'd be nice to have airflow-cli, airflow-api,
>> > airflow-scheduler, individual provider repos that are cleanly separated)
>> as
>> > long as we bring the commits/contributions back into the monorepo with
>> > automation.
>> >
>> > Maybe we could dive a little deeper into how K8s is operating, before
>> going
>> > with submodules?
>> >
>> > -Ry
>> >
>> >
>> >
>> >
>> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>> >
>> > > Let's come to a consensus first before we do anything :-)
>> > >
>> > > Is everyone happy with separate repo approach? Let's wait for 72 hours
>> to
>> > > hear from all and then have a plan on how we do it? WDYT?
>> > >
>> > > But indeed git submodules approach sounds good. We do it for for
>> *Airflow
>> > > Site *(
>> > >
>> > >
>> >
>> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
>> > > )
>> > > too.
>> > >
>> > > Regards,
>> > > Kaxil
>> > >
>> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <jarek.pot...@polidea.com>
>> > > wrote:
>> > >
>> > > > Absolutely - I am happy to add "best practices" and short
>> "howto do
>> > stuff
>> > > > with git submodules"  - and this knowledge will only be needed for
>> > > > interacting with prod image/helmchart/running kubernetes tests. For
>> all
>> > > the
>> > > > other purposes it should be "business as usual".
>> > > >
>> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
>> > > daniel.imber...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > I think git submodules sounds like a great idea. We would
>> need to
>> > write
>> > > > > this into the CONTRIBUTING.md to let people know how to do it but
>> > It’s
>> > > a
>> > > > > “teach once” situation.
>> > > > >
>> > > > > via Newton Mail [
>> > > > >
>> > > >
>> > >
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> > > > > ]
>> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
>> > turbas...@apache.org>
>> > > > > wrote:
>> > > > > I support the idea of separate repos. The git submodules mentioned
>> by
>> > > > > Jarek sounds like an interesting solution. It may add some
>> complexity
>> > > > > for new contributors but it's not rocket science. If we agree on
>> > using
>> > > > > this we should add small how-to in contributing.rst I think (i.e.
>> do
>> > I
>> > > > > have to have fork of each repo?).
>> > > > >
>> > > > > As stressed previously if we go this route we should make
>> sure we
>> > have
>> > > > > nice testing of all those three components. Regarding the
>> versioning,
>> > > > > I have no strong opinion but I fully support using separate issues
>> > for
>> > > > > airflow, docker, and helm.
>> > > > >
>> > > > > Tomek
>> > > > >
>> > > > >
>> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
>> > jarek.pot...@polidea.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
>> > > > > daniel.imber...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > I’m fine with keeping it as three separate repos but merging
>> > testing
>> > > > > > > somehow (e.g. the source code chart would pull the helm/docker
>> > > chart
>> > > > > into
>> > > > > > > .build) but we need to do it in a way that doesn’t make testing
>> > too
>> > > > > > > difficult.
>> > > > > > >
>> > > > > > > So for example: How do I test/integration test a change that
>> > > > involves a
>> > > > > > > change to all three and has to be done at the same time?
>> Perhaps
>> > a
>> > > > > user can
>> > > > > > > “register” a branch of helm and docker when they start up
>> breeze?
>> > > Or
>> > > > > > > perhaps we create a “parent” integration test that uses the
>> three
>> > > > > together?
>> > > > > > >
>> > > > > >
>> > > > > > Yes, those are exactly my concerns when splitting the repos.
>> > > > > >
>> > > > > > I think testing for development should remain in the "airflow"
>> > repo.
>> > > It
>> > > > > is
>> > > > > > the "central one" in fact. I slept it over and I think using
>> > > "released"
>> > > > > > versions for development testing will suffer from this "we
>> need a
>> > > > change
>> > > > > in
>> > > > > > all three of those".
>> > > > > >
>> > > > > > But we have an easy solution I think.
>> > > > > >
>> > > > > > I think that simply setting submodules properly should do
>> to the
>> > job:
>> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules. They seem
>> to
>> > be
>> > > > > > perfect for our case.
>> > > > > >
>> > > > > > For those who have not used it - in short - submodules work in
>> the
>> > > way
>> > > > > that
>> > > > > > they register the "linked repos" and store related "hash"
>> of the
>> > > commit
>> > > > > > from that linked repo. For example, the "chart" folder will
>> be a
>> > link
>> > > > to
>> > > > > > "apache/airflow-helm-chart". We can also move the prod Dockerfile
>> > to
>> > > a
>> > > > > > subfolder and link it to the separate repo. Git submodule
>> has a
>> > > > > > built-in mechanism to a) update to the latest version of the
>> repo,
>> > b)
>> > > > > > commit your changes to the linked repo from there which is
>> all we
>> > > > need. I
>> > > > > > used those few times - I never liked submodules for sharing
>> > "library"
>> > > > > code,
>> > > > > > but for sharing helm/Docker It seems perfect.
>> > > > > >
>> > > > > > From the "regular" developer point of view - you do not
>> need to
>> > > > > get/update
>> > > > > > submodules if you do not need to use them - so for all the
>> > > development
>> > > > > > purposes if you only change the "airflow" code, you would not
>> even
>> > > need
>> > > > > to
>> > > > > > sync chart or Dockerfile. You do "git checkout" as usual
>> and it
>> > > should
>> > > > > > work. So basically - no change for "regular" airflow development.
>> > > > > >
>> > > > > > However, if you do need to work on helm + Docker + code,
>> then you
>> > > > simply
>> > > > > to
>> > > > > > "git submodule update", go to the linked "helm" or "docker"
>> folder,
>> > > > > > checkout the "master" version and you start making changes. The
>> > only
>> > > > > thing
>> > > > > > to remember when you want to push your changes is to do
>> `git push
>> > > > > > --recurse-sumbodules="check" ` and it will make sure that
>> all the
>> > > repos
>> > > > > are
>> > > > > > updated, It is a bit involved, but latest git version have
>> a very
>> > > good
>> > > > > > support and it must only be used by people who work on
>> airflow +
>> > > > docker +
>> > > > > > helm - all the others are unaffected.
>> > > > > >
>> > > > > > From the CI perspective also nothing changes - when we checkout
>> the
>> > > > code
>> > > > > we
>> > > > > > will include submodules and our test harness will be largely
>> > > unchanged.
>> > > > > > Submodule provides us with the right mechanism for cross
>> dependency
>> > > > even
>> > > > > if
>> > > > > > we use branches.
>> > > > > >
>> > > > > > If everyone will be ok with that - I am happy to set it up, With
>> > > > > submodules
>> > > > > > - we can switch to separate repos even without releasing
>> helm and
>> > > Prod
>> > > > > > chart "officially".
>> > > > > >
>> > > > > > J.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > >
>> > > > > > > via Newton Mail [
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> > > > > > > ]
>> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
>> > > > jarek.pot...@polidea.com
>> > > > > >
>> > > > > > > wrote:
>> > > > > > > Sure. We can work with such an approach. There will be some
>> > > > > dependencies
>> > > > > > > that we might find are problematic, but If we all see
>> that it's
>> > > > > > > worth trying, there is a clear benefit that it makes for a
>> > "clean"
>> > > > > > > split between those different "entities". And possibly
>> once we
>> > > > release
>> > > > > > > first versions of both image and chart, such problems
>> will be
>> > rare
>> > > > and
>> > > > > easy
>> > > > > > > to fix.
>> > > > > > >
>> > > > > > > I personally think such split is inevitable eventually, it's
>> > just a
>> > > > > matter
>> > > > > > > when to do it. If we decide to make this happen soon - I am
>> more
>> > > than
>> > > > > happy
>> > > > > > > to work on making the split reality.
>> > > > > > >
>> > > > > > > One prerequisite to that is that all those - Helm Chart, Prod
>> > Image
>> > > > and
>> > > > > > > Airflow are released in stable versions separately
>> "officially" -
>> > > > from
>> > > > > the
>> > > > > > > current sources (otherwise there will be no way to test
>> > > cross-repo).
>> > > > > > >
>> > > > > > > I think for that we will need to agree on the versioning scheme
>> > and
>> > > > > cadence
>> > > > > > > for the Image and Helm Chart, then copy sources from airflow
>> and
>> > > > > release
>> > > > > > > them as "baseline" including setup the tests for all of
>> those -
>> > > then
>> > > > we
>> > > > > > > can remove both Helm and Dockerfile from the airflow repo.
>> Happy
>> > to
>> > > > > help
>> > > > > > > with that if that's the direction we choose as a
>> community. It
>> is
>> > > > > important
>> > > > > > > though that we keep the cross-repo testing working. We
>> have it
>> > > > working
>> > > > > as
>> > > > > > > of yesterday, so now the matter is - whatever we do we
>> keep it
>> > > > running
>> > > > > and
>> > > > > > > have development environment support easy development and
>> testing
>> > > of
>> > > > > > > either of the three (including CI testing cross-repos) , That's
>> > the
>> > > > > only
>> > > > > > > really important thing to me - the rest is more of technicality
>> > how
>> > > > we
>> > > > > link
>> > > > > > > the repos, but principle remains.
>> > > > > > >
>> > > > > > > Do we have an idea for the versioning scheme that we
>> would like
>> > to
>> > > > use
>> > > > > for
>> > > > > > > the Helm Chart and prod image ?
>> > > > > > >
>> > > > > > > Should we make it CalVer
>> <https://calver.org/overview.html> or
>> > > > SemVer
>> > > > > > > <https://semver.org/> (or some other scheme)? And how should
>> we
>> > > > treat
>> > > > > the
>> > > > > > > combinations with Airflow?
>> > > > > > >
>> > > > > > > My thoughts (but I have no strong opinions as long as someone
>> > > > proposes
>> > > > > more
>> > > > > > > sensible versioning schemes):
>> > > > > > >
>> > > > > > > 1) Airflow code - we continue the release scheme we have (with
>> > > > > deciding on
>> > > > > > > 2.* scheme for the release). I expect in the future we might
>> > decide
>> > > > on
>> > > > > > > doing branches or patches so for 2.* I'd opt for going full
>> > SemVer
>> > > > > approach
>> > > > > > > and patches released from branches.
>> > > > > > >
>> > > > > > > 2) I believe that Helm Chart can be versioned with its own
>> > version
>> > > > > (then
>> > > > > > > you specify the image version as helm parameter). For the Helm
>> > > Chart
>> > > > I
>> > > > > > > think CalVer might be OK as I do not expect any
>> branching/patches
>> > > in
>> > > > > the
>> > > > > > > future - I'd expect that there will be a single stream of
>> > releases.
>> > > > > > >
>> > > > > > > 3) Dockerfile (+ related files such as .dockerignore, empty
>> dir,
>> > > > > > > entrypoints etc). i do not imagine a lot of branching for
>> those -
>> > > we
>> > > > > > > should be able to release a new version of a Dockerfile (+
>> > related
>> > > > > files)
>> > > > > > > working with nearly any earlier Airflow release, so CalVer
>> seems
>> > > > like a
>> > > > > > > good choice.
>> > > > > > >
>> > > > > > > 4) Image versioning becomes a bit most complex because the
>> image
>> > > tag
>> > > > is
>> > > > > > > always combination of:
>> > > > > > > * Dockerfile (+ related files) version
>> > > > > > > * Airflow Version
>> > > > > > > * Python Version
>> > > > > > >
>> > > > > > > An example versioning I can imagine:
>> > > > > > >
>> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 - patch level
>> > (if
>> > > we
>> > > > > > > decide to have patches).
>> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... -> depending
>> when we
>> > > > release
>> > > > > > > them
>> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm Chart
>> has a
>> > > > > minimum
>> > > > > > > version of both Dockerfile and Airflow versions it works with.
>> > > > > > >
>> > > > > > > *Example Docker Image tags:*
>> > > > > > > apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
>> > > > > > >
>> > > > > > > WDYT?
>> > > > > > >
>> > > > > > > J,
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
>> kaxiln...@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > I think we should have "separate repos for development" too.
>> > > > > > > >
>> > > > > > > > 3 Repos in total:
>> > > > > > > >
>> > > > > > > > 1) apache/airflow
>> > > > > > > > 2) apache/airflow-docker-image
>> > > > > > > > 3) apache/airflow-helm-chart
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > (1) *apache/airflow* should use a pinned stable version of
>> > > Airflow
>> > > > > Helm
>> > > > > > > > chart to run Kubernetes tests
>> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci* file which
>> it
>> > > can
>> > > > > use to
>> > > > > > > > run airflow tests on docker images.
>> > > > > > > > (3) *apache/airflow-docker-image *should use the latest
>> > available
>> > > > > stable
>> > > > > > > > version of airflow
>> > > > > > > > (4) *apache/airflow-helm-chart *should use the latest
>> available
>> > > > > stable
>> > > > > > > > version of airflow
>> > > > > > > >
>> > > > > > > > Having such split also makes some updates more
>> difficult -
>> for
>> > > > > example if
>> > > > > > > > > we add new "extra" to Airflow that will require to install
>> > > "apt"
>> > > > > > > > dependency
>> > > > > > > > > in Dockerfile, we will have to split it into first adding
>> the
>> > > > > > > dependency
>> > > > > > > > to
>> > > > > > > > > Dockerfile, and once it is merged, we can add the
>> extra to
>> > > > airflow
>> > > > > with
>> > > > > > > > > setup.py.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Adding a new extra to setup.py would not (and should not)
>> > impact
>> > > > the
>> > > > > > > > development of *apache/airflow-docker-image*
>> > > > > > > > Once an RC is cut for apache/airflow or after a new version
>> is
>> > > > > released
>> > > > > > > for
>> > > > > > > > apache/airflow, we can work on supporting the new airflow
>> > version
>> > > > in
>> > > > > the
>> > > > > > > > Production Docker Image.
>> > > > > > > > While doing that we can add all the libraries that are needed
>> > by
>> > > > the
>> > > > > new
>> > > > > > > > Airflow Version and we will have a clean commit history and
>> > > > > changelog for
>> > > > > > > > Docker image.
>> > > > > > > >
>> > > > > > > > We definitely do not need to work parallelly on both the
>> repos.
>> > > By
>> > > > > doing
>> > > > > > > > development in a separate repo we keep consistent "source"
>> > files
>> > > > and
>> > > > > we
>> > > > > > > can
>> > > > > > > > release each artifact with a
>> > > > > > > > separate cadence. If someone discovers bug in newly released
>> > > > > Dockerimage,
>> > > > > > > > we should be easily able to cut out a new release with the
>> > patch
>> > > > > without
>> > > > > > > > worrying about how development is
>> > > > > > > > going in the apache/airflow repo.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the similar
>> manner:
>> > > > > > > >
>> > > > > > > > https://github.com/apache/flink &
>> > > > > https://github.com/apache/flink-docker
>> > > > > > > > https://github.com/apache/couchdb &
>> > > > > > > > https://github.com/apache/couchdb-docker
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > > Kaxil
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
>> > > > > jarek.pot...@polidea.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > I do not think it's only the question of Mono/Multi repos.
>> > > While
>> > > > I
>> > > > > > > > clearly
>> > > > > > > > > see the benefit of separate repos I also see some
>> drawbacks.
>> > > > > > > > >
>> > > > > > > > > And if it bothers others, I am happy to follow the
>> majority.
>> > If
>> > > > we
>> > > > > > > think
>> > > > > > > > > that a bit more complexity in testing justifies separating
>> > > those
>> > > > > three
>> > > > > > > > > completely and having more "clean"- it's also
>> workable but
>> > IMHO
>> > > > > > > > introduces
>> > > > > > > > > certain complexity in development.
>> > > > > > > > >
>> > > > > > > > > However I think this is not 0/1 a kind of Hybrid approach
>> in
>> > my
>> > > > > opinion
>> > > > > > > > > might be best of both worlds - development and
>> releases .
>> > > > > > > > >
>> > > > > > > > > Let me explain what I mean by "Hybrid":
>> > > > > > > > >
>> > > > > > > > > I think we definitely should have separate
>> repositories to
>> > > > release
>> > > > > > > those
>> > > > > > > > > artifacts and I think there is no doubt about it:
>> > > > > > > > >
>> > > > > > > > > * airflow (apache/airflow)
>> > > > > > > > > * prod docker image (apache/airflow-docker)
>> > > > > > > > > * helm chart (apache/airflow-helm)
>> > > > > > > > > * api clients (we already have separate repos for those)
>> > > > > > > > > (apache/airflow-client-*)
>> > > > > > > > >
>> > > > > > > > > I think the only question is where we develop all those
>> > > (develop
>> > > > !=
>> > > > > > > > > release). There are certain benefits of having a single
>> > > "master"
>> > > > > (let's
>> > > > > > > > > call it "development" further) for all those artifacts.
>> > > Currently
>> > > > > the
>> > > > > > > > > "development" version for all of those is in one repo
>> - and
>> > > while
>> > > > > > > > > developing one depends on the other, we also test all of
>> > those
>> > > > > together
>> > > > > > > > and
>> > > > > > > > > this means that "current best" set of airflow sources
>> > > (including
>> > > > > > > > > dependencies in setup.py), Dockerfile and Helm chart work.
>> > This
>> > > > > means
>> > > > > > > for
>> > > > > > > > > example that you will not be able to break the Helm Chart
>> by
>> > > > > changing
>> > > > > > > > > anything that the helm chart depends on in airflow. For
>> > example
>> > > > if
>> > > > > you
>> > > > > > > > > change "airflow webserver" into "airflow server" the
>> current
>> > > helm
>> > > > > chart
>> > > > > > > > > will break. Similarly if you change entrypoint,sh in Docker
>> > > image
>> > > > > in a
>> > > > > > > > way
>> > > > > > > > > that is not compatible with Helm chart, we will not let
>> that
>> > > > > happen -
>> > > > > > > the
>> > > > > > > > > CI tests will break if either of those changes in an
>> > > incompatible
>> > > > > way.
>> > > > > > > > And
>> > > > > > > > > we can have dependencies in any direction between those
>> > three.
>> > > > > When we
>> > > > > > > > see
>> > > > > > > > > a commit break either of the three - we can make a decision
>> > > about
>> > > > > what
>> > > > > > > to
>> > > > > > > > > do - either accept and document the incompatibility
>> or fix
>> > it.
>> > > > > > > > >
>> > > > > > > > > Of course keeping that property (testing it all together)
>> is
>> > > also
>> > > > > > > > possible
>> > > > > > > > > if they are in completely separate repos. There are several
>> > > > > > > > > cross-dependencies - Docker image building depends on
>> > > > dependencies
>> > > > > in
>> > > > > > > > > setup.py for example, you cannot build Docker image from
>> only
>> > > > > > > Dockerfile
>> > > > > > > > > without the sources of airflow nor build and test helm
>> charts
>> > > > > without
>> > > > > > > the
>> > > > > > > > > image (and sources - because that's where the current
>> > > kubernetes
>> > > > > tests
>> > > > > > > > > are). If we want to continue doing it for both Helm and
>> > > > > Dockerfile, we
>> > > > > > > > > would have to basically check out the latest sources of
>> > Airflow
>> > > > > and run
>> > > > > > > > the
>> > > > > > > > > CI tests before merging any Docker or Helm Chart changes
>> and
>> > > the
>> > > > > > > > opposite -
>> > > > > > > > > we will have to download Dockerfile/Helm chart and build
>> > > > > image/install
>> > > > > > > > Helm
>> > > > > > > > > chart when we are running CI tests for Airflow. This is
>> > > possible
>> > > > > and we
>> > > > > > > > > could do it, but it adds complexity to the build/CI
>> process.
>> > > > > > > > >
>> > > > > > > > > Having such split also makes some updates more
>> difficult -
>> > for
>> > > > > example
>> > > > > > > if
>> > > > > > > > > we add new "extra" to Airflow that will require to install
>> > > "apt"
>> > > > > > > > dependency
>> > > > > > > > > in Dockerfile, we will have to split it into first adding
>> the
>> > > > > > > dependency
>> > > > > > > > to
>> > > > > > > > > Dockerfile, and once it is merged, we can add the
>> extra to
>> > > > airflow
>> > > > > with
>> > > > > > > > > setup.py. This makes it quite difficult to test it together
>> > > > though
>> > > > > (the
>> > > > > > > > > Dockerfile change can only be tested fully after
>> merging it
>> > to
>> > > > > master).
>> > > > > > > > Not
>> > > > > > > > > mentioning complexity of managing different versions
>> - your
>> > > local
>> > > > > > > > > development Dockerfile version vs sources of Airflow for
>> > > example.
>> > > > > > > Imagine
>> > > > > > > > > switching between branches where you add two
>> different apt
>> > > > > dependencies
>> > > > > > > > to
>> > > > > > > > > the Dockerfile. There are more similar scenarios I can
>> > imagine
>> > > -
>> > > > > > > > especially
>> > > > > > > > > for parallel changes in those repos.
>> > > > > > > > >
>> > > > > > > > > This is of course doable to keep them separate, but
>> it is
>> > > quite a
>> > > > > bit
>> > > > > > > > more
>> > > > > > > > > complex to set up (especially for a consistent development
>> > > > > environment)
>> > > > > > > > > when you have separate repos and prevent cross-breaking
>> > changes
>> > > > > might
>> > > > > > > be
>> > > > > > > > > more difficult.
>> > > > > > > > >
>> > > > > > > > > I believe that the best way is to continue developing
>> > airflow +
>> > > > > image +
>> > > > > > > > > chart in one repo - airflow, but release them from those
>> > > separate
>> > > > > > > repos.
>> > > > > > > > >
>> > > > > > > > > Airflow source release does not have to contain neither
>> > chart,
>> > > > nor
>> > > > > > > image.
>> > > > > > > > > And even if it contains sources for those, they are
>> not the
>> > > final
>> > > > > > > > > "artifacts" (installable image and installable helm chart).
>> > > > > > > > > Whenever we decide to release either of them - we
>> test it
>> in
>> > > > > > > > "development".
>> > > > > > > > > Then only when it is tested, we copy the sources to those
>> > > > separate
>> > > > > > > repos
>> > > > > > > > > and release them.
>> > > > > > > > >
>> > > > > > > > > With git - we can even do it very easily while preserving
>> > > history
>> > > > > of
>> > > > > > > > > commits easily (been there, done that). And then we could
>> > > release
>> > > > > Helm
>> > > > > > > > and
>> > > > > > > > > Docker image separately based on the commits and tags in
>> > those
>> > > > > separate
>> > > > > > > > > repositories.
>> > > > > > > > >
>> > > > > > > > > I agree that separate repos is a more "clean" approach.
>> But I
>> > > > > think it
>> > > > > > > is
>> > > > > > > > > less convenient for development consistency.
>> > > > > > > > >
>> > > > > > > > > J,
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
>> > kaxiln...@gmail.com
>> > > >
>> > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Forgot to mention, having them in separate repo also
>> helps
>> > in
>> > > > > better
>> > > > > > > > > > managing each individual artifacts.
>> > > > > > > > > >
>> > > > > > > > > > Each repo would have a separate Github Issue where
>> we can
>> > > track
>> > > > > the
>> > > > > > > > issue
>> > > > > > > > > > specific to Helm chart or Dockerfile.
>> > > > > > > > > >
>> > > > > > > > > > Regards,
>> > > > > > > > > > Kaxil
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
>> > > kaxiln...@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > The PMC also needs to agree if we want separate VOTING
>> > for
>> > > > > Docker
>> > > > > > > > Image
>> > > > > > > > > > > and Helm chart, I think we do.
>> > > > > > > > > > >
>> > > > > > > > > > > Regards,
>> > > > > > > > > > > Kaxil
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
>> > > > kaxiln...@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >> Hi all,
>> > > > > > > > > > >>
>> > > > > > > > > > >> What do you all think about having Dockerfile
>> and Helm
>> > > chart
>> > > > > in
>> > > > > > > the
>> > > > > > > > > same
>> > > > > > > > > > >> "Airflow" Repo vs separate?
>> > > > > > > > > > >>
>> > > > > > > > > > >> I feel having a separate repo for Airflow Dockerfile
>> and
>> > > > Helm
>> > > > > > > chart
>> > > > > > > > > have
>> > > > > > > > > > >> more benefits like easy to track changes (via
>> > Changelog),
>> > > > > easy for
>> > > > > > > > new
>> > > > > > > > > > >> contributors, separate release cadence.
>> > > > > > > > > > >>
>> > > > > > > > > > >> Currently, docker file and Helm Chart are inside the
>> > same
>> > > > > repo and
>> > > > > > > > > when
>> > > > > > > > > > >> we release changelog for a new Airflow version, it
>> would
>> > > > > include
>> > > > > > > all
>> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart)
>> which I
>> > think
>> > > is
>> > > > > not
>> > > > > > > > that
>> > > > > > > > > > great.
>> > > > > > > > > > >>
>> > > > > > > > > > >> Also having them all inside a single repo means
>> changes
>> > in
>> > > > > Helm
>> > > > > > > > Chart
>> > > > > > > > > > and
>> > > > > > > > > > >> Dockerfile can block Airflow release. We could use
>> > stable
>> > > > Helm
>> > > > > > > Chart
>> > > > > > > > > > >> version and Dockerfile version to test Airflow
>> so that
>> > > they
>> > > > > are
>> > > > > > > > > > blockers to
>> > > > > > > > > > >> release too.
>> > > > > > > > > > >>
>> > > > > > > > > > >> Happy to hear the thoughts from the community.
>> > > > > > > > > > >>
>> > > > > > > > > > >> Regards,
>> > > > > > > > > > >> Kaxil
>> > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > >
>> > > > > > > > > Jarek Potiuk
>> > > > > > > > > Polidea <https://www.polidea.com/> | Principal Software
>> > > Engineer
>> > > > > > > > >
>> > > > > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Jarek Potiuk
>> > > > > > > Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > > > > > >
>> > > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > >
>> > > > > > Jarek Potiuk
>> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > > > >
>> > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Jarek Potiuk
>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > >
>> > > > M: +48 660 796 129 <+48660796129>
>> > > > [image: Polidea] <https://www.polidea.com/>
>> > > >
>> > >
>> >
>>  
>>  
>> --
>>  
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>  
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>  
>

Reply via email to