Monorepo FTW. Yes, it gets a little bit messier around release, but the approach of automatically extracting out the commits (or parts of commits) to a separate repo for releasing may be the solution to that problem
-ash On Jul 3 2020, at 7:51 pm, Kaxil Naik <kaxiln...@gmail.com> wrote: > I will take a look at the Kubernetes approach and get back to this thread. > > We had a discussion with Daniel yesterday and we are both concerned about >> all the overhead for people like us who work on all three "entities" >> at the >> same time. Even just explaining how to work with Pull Requests and in what >> sequence those PRs would have to be opened and merged in case of changes >> that are spanning across several "entities" - was a challenge. I was unable >> to clearly explain the sequence and way of reviewing/merging the PRs that >> will have to be made if we have submodules. This is a bad sign as I was >> using submodules in the past and know how it works but I was unable to >> explain it clearly. > > > We don't even need submodules tbh. We can just use Bash Script that > pulls a > pinned Helm Chart version. > We only need Helm chart to run integration test for k8s (atleast for now). > We already use tons of Bash scripts. > > One of the important benefits of separation that changes in one component > should not need change in other component, atleast > not immediately. > > Changes in Helm chart and Docker file should never need changes in Airflow > Changes in Airflow should only ever need a change in Dockerfile and Helm > Chart after a new version is released. > > I just had a talk with Daniel too and still didn't find a good enough > reason to have them in the same repo. > > I will definitely look at the Kubernetes approach (maybe it is better) and > get back to this thread. But as of now I don't see any major PROs > for having them in the same repo. > > Regards, > Kaxil > > > > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >> I think Ry's point is an important one - I thought about writing a longer >> post but I looked at the Kubernetes structure and I really like it so just >> wanted to comment on this last one. >> >> Seems that it is simply one "authoritative" (or source of truth) repo where >> everything is developed in monorepo fashion but then there is a bot >> that moves every commit related to subdirectories to those "split-out" >> repos. There are never direct commits of people or PRs in the "split-out" >> repositories. This is very similar to my original proposal to have >> dedicated repos used for releases - but with an automated way of publishing >> the commits to the "separated" repos at the moment, they are merged to >> master in the main repo. I love it. >> >> I think it's really good and "pragmatic" solution. The code is >> available in >> separate repos, including the history of commits related to each "entity" >> (so only chart-related commits in chart repo). Issues for particular >> "entities" are in those separate repos as well (something that Kaxil >> mentioned). Users (not developers!) who are interested only in Dockerfile >> or Helm Chart have separate repos they can look at - with only relevant >> changes and history of releases for that particular entity. They can raise >> issues there (and in GitHub, we can easily refer to those issues from the >> main "airflow" repo). All the discussion from "user issues" are kept >> in the >> relevant repositories. Still - comments about development changes (and >> related issues) might still be kept in the main "airflow" repo - next to >> other "development" changes. >> >> We can run separate releases from those linked repositories and even >> publish sources directly from those repositories rather than from the main >> one. At the same time - we avoid all the hassle of submodules. >> >> We had a discussion with Daniel yesterday and we are both concerned about >> all the overhead for people like us who work on all three "entities" >> at the >> same time. Even just explaining how to work with Pull Requests and in what >> sequence those PRs would have to be opened and merged in case of changes >> that are spanning across several "entities" - was a challenge. I was unable >> to clearly explain the sequence and way of reviewing/merging the PRs that >> will have to be made if we have submodules. This is a bad sign as I was >> using submodules in the past and know how it works but I was unable to >> explain it clearly. >> >> I really, really like Kubernetes approach - seems that it's one of the >> cases where we can "eat cake and have it too". >> >> J. >> >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <r...@rywalker.com> wrote: >> >> > One reason to have a monorepo is for project branding, and end user >> > experience. But for component development experience, it's nice to >> have a >> > small, dedicated repo. >> > >> > I think the git submodule approach is technically sound, but is at odds >> > with making the project easy to consume/understand from the end user >> > perspective, especially if we expand the use of subprojects. And >> the main >> > Airflow commit graph would appear to be slowing down which is bad for >> > Airflow brand perception. >> > >> > Kubernetes has many sub-repos that are integrated into the main >> repo - >> > which I think could be the best of both worlds: >> > Example: https://github.com/kubernetes/kubernetes/tree/master/staging >> > >> > I haven't dug in very deeply, and I won't pretend to understand how >> > challenging it may be to maintain this structure, but I'd support >> breaking >> > more components out of the main Airflow repo for dev purposes (for >> example, >> > in the future, it'd be nice to have airflow-cli, airflow-api, >> > airflow-scheduler, individual provider repos that are cleanly separated) >> as >> > long as we bring the commits/contributions back into the monorepo with >> > automation. >> > >> > Maybe we could dive a little deeper into how K8s is operating, before >> going >> > with submodules? >> > >> > -Ry >> > >> > >> > >> > >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <kaxiln...@gmail.com> wrote: >> > >> > > Let's come to a consensus first before we do anything :-) >> > > >> > > Is everyone happy with separate repo approach? Let's wait for 72 hours >> to >> > > hear from all and then have a plan on how we do it? WDYT? >> > > >> > > But indeed git submodules approach sounds good. We do it for for >> *Airflow >> > > Site *( >> > > >> > > >> > >> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes >> > > ) >> > > too. >> > > >> > > Regards, >> > > Kaxil >> > > >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <jarek.pot...@polidea.com> >> > > wrote: >> > > >> > > > Absolutely - I am happy to add "best practices" and short >> "howto do >> > stuff >> > > > with git submodules" - and this knowledge will only be needed for >> > > > interacting with prod image/helmchart/running kubernetes tests. For >> all >> > > the >> > > > other purposes it should be "business as usual". >> > > > >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman < >> > > daniel.imber...@gmail.com> >> > > > wrote: >> > > > >> > > > > I think git submodules sounds like a great idea. We would >> need to >> > write >> > > > > this into the CONTRIBUTING.md to let people know how to do it but >> > It’s >> > > a >> > > > > “teach once” situation. >> > > > > >> > > > > via Newton Mail [ >> > > > > >> > > > >> > > >> > >> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 >> > > > > ] >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek < >> > turbas...@apache.org> >> > > > > wrote: >> > > > > I support the idea of separate repos. The git submodules mentioned >> by >> > > > > Jarek sounds like an interesting solution. It may add some >> complexity >> > > > > for new contributors but it's not rocket science. If we agree on >> > using >> > > > > this we should add small how-to in contributing.rst I think (i.e. >> do >> > I >> > > > > have to have fork of each repo?). >> > > > > >> > > > > As stressed previously if we go this route we should make >> sure we >> > have >> > > > > nice testing of all those three components. Regarding the >> versioning, >> > > > > I have no strong opinion but I fully support using separate issues >> > for >> > > > > airflow, docker, and helm. >> > > > > >> > > > > Tomek >> > > > > >> > > > > >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk < >> > jarek.pot...@polidea.com> >> > > > > wrote: >> > > > > > >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman < >> > > > > daniel.imber...@gmail.com> >> > > > > > wrote: >> > > > > > >> > > > > > I’m fine with keeping it as three separate repos but merging >> > testing >> > > > > > > somehow (e.g. the source code chart would pull the helm/docker >> > > chart >> > > > > into >> > > > > > > .build) but we need to do it in a way that doesn’t make testing >> > too >> > > > > > > difficult. >> > > > > > > >> > > > > > > So for example: How do I test/integration test a change that >> > > > involves a >> > > > > > > change to all three and has to be done at the same time? >> Perhaps >> > a >> > > > > user can >> > > > > > > “register” a branch of helm and docker when they start up >> breeze? >> > > Or >> > > > > > > perhaps we create a “parent” integration test that uses the >> three >> > > > > together? >> > > > > > > >> > > > > > >> > > > > > Yes, those are exactly my concerns when splitting the repos. >> > > > > > >> > > > > > I think testing for development should remain in the "airflow" >> > repo. >> > > It >> > > > > is >> > > > > > the "central one" in fact. I slept it over and I think using >> > > "released" >> > > > > > versions for development testing will suffer from this "we >> need a >> > > > change >> > > > > in >> > > > > > all three of those". >> > > > > > >> > > > > > But we have an easy solution I think. >> > > > > > >> > > > > > I think that simply setting submodules properly should do >> to the >> > job: >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules. They seem >> to >> > be >> > > > > > perfect for our case. >> > > > > > >> > > > > > For those who have not used it - in short - submodules work in >> the >> > > way >> > > > > that >> > > > > > they register the "linked repos" and store related "hash" >> of the >> > > commit >> > > > > > from that linked repo. For example, the "chart" folder will >> be a >> > link >> > > > to >> > > > > > "apache/airflow-helm-chart". We can also move the prod Dockerfile >> > to >> > > a >> > > > > > subfolder and link it to the separate repo. Git submodule >> has a >> > > > > > built-in mechanism to a) update to the latest version of the >> repo, >> > b) >> > > > > > commit your changes to the linked repo from there which is >> all we >> > > > need. I >> > > > > > used those few times - I never liked submodules for sharing >> > "library" >> > > > > code, >> > > > > > but for sharing helm/Docker It seems perfect. >> > > > > > >> > > > > > From the "regular" developer point of view - you do not >> need to >> > > > > get/update >> > > > > > submodules if you do not need to use them - so for all the >> > > development >> > > > > > purposes if you only change the "airflow" code, you would not >> even >> > > need >> > > > > to >> > > > > > sync chart or Dockerfile. You do "git checkout" as usual >> and it >> > > should >> > > > > > work. So basically - no change for "regular" airflow development. >> > > > > > >> > > > > > However, if you do need to work on helm + Docker + code, >> then you >> > > > simply >> > > > > to >> > > > > > "git submodule update", go to the linked "helm" or "docker" >> folder, >> > > > > > checkout the "master" version and you start making changes. The >> > only >> > > > > thing >> > > > > > to remember when you want to push your changes is to do >> `git push >> > > > > > --recurse-sumbodules="check" ` and it will make sure that >> all the >> > > repos >> > > > > are >> > > > > > updated, It is a bit involved, but latest git version have >> a very >> > > good >> > > > > > support and it must only be used by people who work on >> airflow + >> > > > docker + >> > > > > > helm - all the others are unaffected. >> > > > > > >> > > > > > From the CI perspective also nothing changes - when we checkout >> the >> > > > code >> > > > > we >> > > > > > will include submodules and our test harness will be largely >> > > unchanged. >> > > > > > Submodule provides us with the right mechanism for cross >> dependency >> > > > even >> > > > > if >> > > > > > we use branches. >> > > > > > >> > > > > > If everyone will be ok with that - I am happy to set it up, With >> > > > > submodules >> > > > > > - we can switch to separate repos even without releasing >> helm and >> > > Prod >> > > > > > chart "officially". >> > > > > > >> > > > > > J. >> > > > > > >> > > > > > >> > > > > > >> > > > > > > >> > > > > > > via Newton Mail [ >> > > > > > > >> > > > > >> > > > >> > > >> > >> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 >> > > > > > > ] >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk < >> > > > jarek.pot...@polidea.com >> > > > > > >> > > > > > > wrote: >> > > > > > > Sure. We can work with such an approach. There will be some >> > > > > dependencies >> > > > > > > that we might find are problematic, but If we all see >> that it's >> > > > > > > worth trying, there is a clear benefit that it makes for a >> > "clean" >> > > > > > > split between those different "entities". And possibly >> once we >> > > > release >> > > > > > > first versions of both image and chart, such problems >> will be >> > rare >> > > > and >> > > > > easy >> > > > > > > to fix. >> > > > > > > >> > > > > > > I personally think such split is inevitable eventually, it's >> > just a >> > > > > matter >> > > > > > > when to do it. If we decide to make this happen soon - I am >> more >> > > than >> > > > > happy >> > > > > > > to work on making the split reality. >> > > > > > > >> > > > > > > One prerequisite to that is that all those - Helm Chart, Prod >> > Image >> > > > and >> > > > > > > Airflow are released in stable versions separately >> "officially" - >> > > > from >> > > > > the >> > > > > > > current sources (otherwise there will be no way to test >> > > cross-repo). >> > > > > > > >> > > > > > > I think for that we will need to agree on the versioning scheme >> > and >> > > > > cadence >> > > > > > > for the Image and Helm Chart, then copy sources from airflow >> and >> > > > > release >> > > > > > > them as "baseline" including setup the tests for all of >> those - >> > > then >> > > > we >> > > > > > > can remove both Helm and Dockerfile from the airflow repo. >> Happy >> > to >> > > > > help >> > > > > > > with that if that's the direction we choose as a >> community. It >> is >> > > > > important >> > > > > > > though that we keep the cross-repo testing working. We >> have it >> > > > working >> > > > > as >> > > > > > > of yesterday, so now the matter is - whatever we do we >> keep it >> > > > running >> > > > > and >> > > > > > > have development environment support easy development and >> testing >> > > of >> > > > > > > either of the three (including CI testing cross-repos) , That's >> > the >> > > > > only >> > > > > > > really important thing to me - the rest is more of technicality >> > how >> > > > we >> > > > > link >> > > > > > > the repos, but principle remains. >> > > > > > > >> > > > > > > Do we have an idea for the versioning scheme that we >> would like >> > to >> > > > use >> > > > > for >> > > > > > > the Helm Chart and prod image ? >> > > > > > > >> > > > > > > Should we make it CalVer >> <https://calver.org/overview.html> or >> > > > SemVer >> > > > > > > <https://semver.org/> (or some other scheme)? And how should >> we >> > > > treat >> > > > > the >> > > > > > > combinations with Airflow? >> > > > > > > >> > > > > > > My thoughts (but I have no strong opinions as long as someone >> > > > proposes >> > > > > more >> > > > > > > sensible versioning schemes): >> > > > > > > >> > > > > > > 1) Airflow code - we continue the release scheme we have (with >> > > > > deciding on >> > > > > > > 2.* scheme for the release). I expect in the future we might >> > decide >> > > > on >> > > > > > > doing branches or patches so for 2.* I'd opt for going full >> > SemVer >> > > > > approach >> > > > > > > and patches released from branches. >> > > > > > > >> > > > > > > 2) I believe that Helm Chart can be versioned with its own >> > version >> > > > > (then >> > > > > > > you specify the image version as helm parameter). For the Helm >> > > Chart >> > > > I >> > > > > > > think CalVer might be OK as I do not expect any >> branching/patches >> > > in >> > > > > the >> > > > > > > future - I'd expect that there will be a single stream of >> > releases. >> > > > > > > >> > > > > > > 3) Dockerfile (+ related files such as .dockerignore, empty >> dir, >> > > > > > > entrypoints etc). i do not imagine a lot of branching for >> those - >> > > we >> > > > > > > should be able to release a new version of a Dockerfile (+ >> > related >> > > > > files) >> > > > > > > working with nearly any earlier Airflow release, so CalVer >> seems >> > > > like a >> > > > > > > good choice. >> > > > > > > >> > > > > > > 4) Image versioning becomes a bit most complex because the >> image >> > > tag >> > > > is >> > > > > > > always combination of: >> > > > > > > * Dockerfile (+ related files) version >> > > > > > > * Airflow Version >> > > > > > > * Python Version >> > > > > > > >> > > > > > > An example versioning I can imagine: >> > > > > > > >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 - patch level >> > (if >> > > we >> > > > > > > decide to have patches). >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... -> depending >> when we >> > > > release >> > > > > > > them >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm Chart >> has a >> > > > > minimum >> > > > > > > version of both Dockerfile and Airflow versions it works with. >> > > > > > > >> > > > > > > *Example Docker Image tags:* >> > > > > > > apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6 >> > > > > > > >> > > > > > > WDYT? >> > > > > > > >> > > > > > > J, >> > > > > > > >> > > > > > > >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik < >> kaxiln...@gmail.com> >> > > > > wrote: >> > > > > > > >> > > > > > > > I think we should have "separate repos for development" too. >> > > > > > > > >> > > > > > > > 3 Repos in total: >> > > > > > > > >> > > > > > > > 1) apache/airflow >> > > > > > > > 2) apache/airflow-docker-image >> > > > > > > > 3) apache/airflow-helm-chart >> > > > > > > > >> > > > > > > > >> > > > > > > > (1) *apache/airflow* should use a pinned stable version of >> > > Airflow >> > > > > Helm >> > > > > > > > chart to run Kubernetes tests >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci* file which >> it >> > > can >> > > > > use to >> > > > > > > > run airflow tests on docker images. >> > > > > > > > (3) *apache/airflow-docker-image *should use the latest >> > available >> > > > > stable >> > > > > > > > version of airflow >> > > > > > > > (4) *apache/airflow-helm-chart *should use the latest >> available >> > > > > stable >> > > > > > > > version of airflow >> > > > > > > > >> > > > > > > > Having such split also makes some updates more >> difficult - >> for >> > > > > example if >> > > > > > > > > we add new "extra" to Airflow that will require to install >> > > "apt" >> > > > > > > > dependency >> > > > > > > > > in Dockerfile, we will have to split it into first adding >> the >> > > > > > > dependency >> > > > > > > > to >> > > > > > > > > Dockerfile, and once it is merged, we can add the >> extra to >> > > > airflow >> > > > > with >> > > > > > > > > setup.py. >> > > > > > > > >> > > > > > > > >> > > > > > > > Adding a new extra to setup.py would not (and should not) >> > impact >> > > > the >> > > > > > > > development of *apache/airflow-docker-image* >> > > > > > > > Once an RC is cut for apache/airflow or after a new version >> is >> > > > > released >> > > > > > > for >> > > > > > > > apache/airflow, we can work on supporting the new airflow >> > version >> > > > in >> > > > > the >> > > > > > > > Production Docker Image. >> > > > > > > > While doing that we can add all the libraries that are needed >> > by >> > > > the >> > > > > new >> > > > > > > > Airflow Version and we will have a clean commit history and >> > > > > changelog for >> > > > > > > > Docker image. >> > > > > > > > >> > > > > > > > We definitely do not need to work parallelly on both the >> repos. >> > > By >> > > > > doing >> > > > > > > > development in a separate repo we keep consistent "source" >> > files >> > > > and >> > > > > we >> > > > > > > can >> > > > > > > > release each artifact with a >> > > > > > > > separate cadence. If someone discovers bug in newly released >> > > > > Dockerimage, >> > > > > > > > we should be easily able to cut out a new release with the >> > patch >> > > > > without >> > > > > > > > worrying about how development is >> > > > > > > > going in the apache/airflow repo. >> > > > > > > > >> > > > > > > > >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the similar >> manner: >> > > > > > > > >> > > > > > > > https://github.com/apache/flink & >> > > > > https://github.com/apache/flink-docker >> > > > > > > > https://github.com/apache/couchdb & >> > > > > > > > https://github.com/apache/couchdb-docker >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > Kaxil >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk < >> > > > > jarek.pot...@polidea.com> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > I do not think it's only the question of Mono/Multi repos. >> > > While >> > > > I >> > > > > > > > clearly >> > > > > > > > > see the benefit of separate repos I also see some >> drawbacks. >> > > > > > > > > >> > > > > > > > > And if it bothers others, I am happy to follow the >> majority. >> > If >> > > > we >> > > > > > > think >> > > > > > > > > that a bit more complexity in testing justifies separating >> > > those >> > > > > three >> > > > > > > > > completely and having more "clean"- it's also >> workable but >> > IMHO >> > > > > > > > introduces >> > > > > > > > > certain complexity in development. >> > > > > > > > > >> > > > > > > > > However I think this is not 0/1 a kind of Hybrid approach >> in >> > my >> > > > > opinion >> > > > > > > > > might be best of both worlds - development and >> releases . >> > > > > > > > > >> > > > > > > > > Let me explain what I mean by "Hybrid": >> > > > > > > > > >> > > > > > > > > I think we definitely should have separate >> repositories to >> > > > release >> > > > > > > those >> > > > > > > > > artifacts and I think there is no doubt about it: >> > > > > > > > > >> > > > > > > > > * airflow (apache/airflow) >> > > > > > > > > * prod docker image (apache/airflow-docker) >> > > > > > > > > * helm chart (apache/airflow-helm) >> > > > > > > > > * api clients (we already have separate repos for those) >> > > > > > > > > (apache/airflow-client-*) >> > > > > > > > > >> > > > > > > > > I think the only question is where we develop all those >> > > (develop >> > > > != >> > > > > > > > > release). There are certain benefits of having a single >> > > "master" >> > > > > (let's >> > > > > > > > > call it "development" further) for all those artifacts. >> > > Currently >> > > > > the >> > > > > > > > > "development" version for all of those is in one repo >> - and >> > > while >> > > > > > > > > developing one depends on the other, we also test all of >> > those >> > > > > together >> > > > > > > > and >> > > > > > > > > this means that "current best" set of airflow sources >> > > (including >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm chart work. >> > This >> > > > > means >> > > > > > > for >> > > > > > > > > example that you will not be able to break the Helm Chart >> by >> > > > > changing >> > > > > > > > > anything that the helm chart depends on in airflow. For >> > example >> > > > if >> > > > > you >> > > > > > > > > change "airflow webserver" into "airflow server" the >> current >> > > helm >> > > > > chart >> > > > > > > > > will break. Similarly if you change entrypoint,sh in Docker >> > > image >> > > > > in a >> > > > > > > > way >> > > > > > > > > that is not compatible with Helm chart, we will not let >> that >> > > > > happen - >> > > > > > > the >> > > > > > > > > CI tests will break if either of those changes in an >> > > incompatible >> > > > > way. >> > > > > > > > And >> > > > > > > > > we can have dependencies in any direction between those >> > three. >> > > > > When we >> > > > > > > > see >> > > > > > > > > a commit break either of the three - we can make a decision >> > > about >> > > > > what >> > > > > > > to >> > > > > > > > > do - either accept and document the incompatibility >> or fix >> > it. >> > > > > > > > > >> > > > > > > > > Of course keeping that property (testing it all together) >> is >> > > also >> > > > > > > > possible >> > > > > > > > > if they are in completely separate repos. There are several >> > > > > > > > > cross-dependencies - Docker image building depends on >> > > > dependencies >> > > > > in >> > > > > > > > > setup.py for example, you cannot build Docker image from >> only >> > > > > > > Dockerfile >> > > > > > > > > without the sources of airflow nor build and test helm >> charts >> > > > > without >> > > > > > > the >> > > > > > > > > image (and sources - because that's where the current >> > > kubernetes >> > > > > tests >> > > > > > > > > are). If we want to continue doing it for both Helm and >> > > > > Dockerfile, we >> > > > > > > > > would have to basically check out the latest sources of >> > Airflow >> > > > > and run >> > > > > > > > the >> > > > > > > > > CI tests before merging any Docker or Helm Chart changes >> and >> > > the >> > > > > > > > opposite - >> > > > > > > > > we will have to download Dockerfile/Helm chart and build >> > > > > image/install >> > > > > > > > Helm >> > > > > > > > > chart when we are running CI tests for Airflow. This is >> > > possible >> > > > > and we >> > > > > > > > > could do it, but it adds complexity to the build/CI >> process. >> > > > > > > > > >> > > > > > > > > Having such split also makes some updates more >> difficult - >> > for >> > > > > example >> > > > > > > if >> > > > > > > > > we add new "extra" to Airflow that will require to install >> > > "apt" >> > > > > > > > dependency >> > > > > > > > > in Dockerfile, we will have to split it into first adding >> the >> > > > > > > dependency >> > > > > > > > to >> > > > > > > > > Dockerfile, and once it is merged, we can add the >> extra to >> > > > airflow >> > > > > with >> > > > > > > > > setup.py. This makes it quite difficult to test it together >> > > > though >> > > > > (the >> > > > > > > > > Dockerfile change can only be tested fully after >> merging it >> > to >> > > > > master). >> > > > > > > > Not >> > > > > > > > > mentioning complexity of managing different versions >> - your >> > > local >> > > > > > > > > development Dockerfile version vs sources of Airflow for >> > > example. >> > > > > > > Imagine >> > > > > > > > > switching between branches where you add two >> different apt >> > > > > dependencies >> > > > > > > > to >> > > > > > > > > the Dockerfile. There are more similar scenarios I can >> > imagine >> > > - >> > > > > > > > especially >> > > > > > > > > for parallel changes in those repos. >> > > > > > > > > >> > > > > > > > > This is of course doable to keep them separate, but >> it is >> > > quite a >> > > > > bit >> > > > > > > > more >> > > > > > > > > complex to set up (especially for a consistent development >> > > > > environment) >> > > > > > > > > when you have separate repos and prevent cross-breaking >> > changes >> > > > > might >> > > > > > > be >> > > > > > > > > more difficult. >> > > > > > > > > >> > > > > > > > > I believe that the best way is to continue developing >> > airflow + >> > > > > image + >> > > > > > > > > chart in one repo - airflow, but release them from those >> > > separate >> > > > > > > repos. >> > > > > > > > > >> > > > > > > > > Airflow source release does not have to contain neither >> > chart, >> > > > nor >> > > > > > > image. >> > > > > > > > > And even if it contains sources for those, they are >> not the >> > > final >> > > > > > > > > "artifacts" (installable image and installable helm chart). >> > > > > > > > > Whenever we decide to release either of them - we >> test it >> in >> > > > > > > > "development". >> > > > > > > > > Then only when it is tested, we copy the sources to those >> > > > separate >> > > > > > > repos >> > > > > > > > > and release them. >> > > > > > > > > >> > > > > > > > > With git - we can even do it very easily while preserving >> > > history >> > > > > of >> > > > > > > > > commits easily (been there, done that). And then we could >> > > release >> > > > > Helm >> > > > > > > > and >> > > > > > > > > Docker image separately based on the commits and tags in >> > those >> > > > > separate >> > > > > > > > > repositories. >> > > > > > > > > >> > > > > > > > > I agree that separate repos is a more "clean" approach. >> But I >> > > > > think it >> > > > > > > is >> > > > > > > > > less convenient for development consistency. >> > > > > > > > > >> > > > > > > > > J, >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik < >> > kaxiln...@gmail.com >> > > > >> > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Forgot to mention, having them in separate repo also >> helps >> > in >> > > > > better >> > > > > > > > > > managing each individual artifacts. >> > > > > > > > > > >> > > > > > > > > > Each repo would have a separate Github Issue where >> we can >> > > track >> > > > > the >> > > > > > > > issue >> > > > > > > > > > specific to Helm chart or Dockerfile. >> > > > > > > > > > >> > > > > > > > > > Regards, >> > > > > > > > > > Kaxil >> > > > > > > > > > >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik < >> > > kaxiln...@gmail.com >> > > > > >> > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > The PMC also needs to agree if we want separate VOTING >> > for >> > > > > Docker >> > > > > > > > Image >> > > > > > > > > > > and Helm chart, I think we do. >> > > > > > > > > > > >> > > > > > > > > > > Regards, >> > > > > > > > > > > Kaxil >> > > > > > > > > > > >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik < >> > > > kaxiln...@gmail.com >> > > > > > >> > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > >> Hi all, >> > > > > > > > > > >> >> > > > > > > > > > >> What do you all think about having Dockerfile >> and Helm >> > > chart >> > > > > in >> > > > > > > the >> > > > > > > > > same >> > > > > > > > > > >> "Airflow" Repo vs separate? >> > > > > > > > > > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow Dockerfile >> and >> > > > Helm >> > > > > > > chart >> > > > > > > > > have >> > > > > > > > > > >> more benefits like easy to track changes (via >> > Changelog), >> > > > > easy for >> > > > > > > > new >> > > > > > > > > > >> contributors, separate release cadence. >> > > > > > > > > > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are inside the >> > same >> > > > > repo and >> > > > > > > > > when >> > > > > > > > > > >> we release changelog for a new Airflow version, it >> would >> > > > > include >> > > > > > > all >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart) >> which I >> > think >> > > is >> > > > > not >> > > > > > > > that >> > > > > > > > > > great. >> > > > > > > > > > >> >> > > > > > > > > > >> Also having them all inside a single repo means >> changes >> > in >> > > > > Helm >> > > > > > > > Chart >> > > > > > > > > > and >> > > > > > > > > > >> Dockerfile can block Airflow release. We could use >> > stable >> > > > Helm >> > > > > > > Chart >> > > > > > > > > > >> version and Dockerfile version to test Airflow >> so that >> > > they >> > > > > are >> > > > > > > > > > blockers to >> > > > > > > > > > >> release too. >> > > > > > > > > > >> >> > > > > > > > > > >> Happy to hear the thoughts from the community. >> > > > > > > > > > >> >> > > > > > > > > > >> Regards, >> > > > > > > > > > >> Kaxil >> > > > > > > > > > >> >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > >> > > > > > > > > Jarek Potiuk >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal Software >> > > Engineer >> > > > > > > > > >> > > > > > > > > M: +48 660 796 129 <+48660796129> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/> >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > >> > > > > > > Jarek Potiuk >> > > > > > > Polidea <https://www.polidea.com/> | Principal Software >> Engineer >> > > > > > > >> > > > > > > M: +48 660 796 129 <+48660796129> >> > > > > > > [image: Polidea] <https://www.polidea.com/> >> > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > >> > > > > > Jarek Potiuk >> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > > > > > >> > > > > > M: +48 660 796 129 <+48660796129> >> > > > > > [image: Polidea] <https://www.polidea.com/> >> > > > >> > > > >> > > > >> > > > -- >> > > > >> > > > Jarek Potiuk >> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > > > >> > > > M: +48 660 796 129 <+48660796129> >> > > > [image: Polidea] <https://www.polidea.com/> >> > > > >> > > >> > >> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> >