Absolutely - I am happy to add "best practices" and short "howto do stuff with git submodules" - and this knowledge will only be needed for interacting with prod image/helmchart/running kubernetes tests. For all the other purposes it should be "business as usual".
On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <daniel.imber...@gmail.com> wrote: > I think git submodules sounds like a great idea. We would need to write > this into the CONTRIBUTING.md to let people know how to do it but It’s a > “teach once” situation. > > via Newton Mail [ > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 > ] > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <turbas...@apache.org> > wrote: > I support the idea of separate repos. The git submodules mentioned by > Jarek sounds like an interesting solution. It may add some complexity > for new contributors but it's not rocket science. If we agree on using > this we should add small how-to in contributing.rst I think (i.e. do I > have to have fork of each repo?). > > As stressed previously if we go this route we should make sure we have > nice testing of all those three components. Regarding the versioning, > I have no strong opinion but I fully support using separate issues for > airflow, docker, and helm. > > Tomek > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman < > daniel.imber...@gmail.com> > > wrote: > > > > I’m fine with keeping it as three separate repos but merging testing > > > somehow (e.g. the source code chart would pull the helm/docker chart > into > > > .build) but we need to do it in a way that doesn’t make testing too > > > difficult. > > > > > > So for example: How do I test/integration test a change that involves a > > > change to all three and has to be done at the same time? Perhaps a > user can > > > “register” a branch of helm and docker when they start up breeze? Or > > > perhaps we create a “parent” integration test that uses the three > together? > > > > > > > Yes, those are exactly my concerns when splitting the repos. > > > > I think testing for development should remain in the "airflow" repo. It > is > > the "central one" in fact. I slept it over and I think using "released" > > versions for development testing will suffer from this "we need a change > in > > all three of those". > > > > But we have an easy solution I think. > > > > I think that simply setting submodules properly should do to the job: > > https://git-scm.com/book/en/v2/Git-Tools-Submodules. They seem to be > > perfect for our case. > > > > For those who have not used it - in short - submodules work in the way > that > > they register the "linked repos" and store related "hash" of the commit > > from that linked repo. For example, the "chart" folder will be a link to > > "apache/airflow-helm-chart". We can also move the prod Dockerfile to a > > subfolder and link it to the separate repo. Git submodule has a > > built-in mechanism to a) update to the latest version of the repo, b) > > commit your changes to the linked repo from there which is all we need. I > > used those few times - I never liked submodules for sharing "library" > code, > > but for sharing helm/Docker It seems perfect. > > > > From the "regular" developer point of view - you do not need to > get/update > > submodules if you do not need to use them - so for all the development > > purposes if you only change the "airflow" code, you would not even need > to > > sync chart or Dockerfile. You do "git checkout" as usual and it should > > work. So basically - no change for "regular" airflow development. > > > > However, if you do need to work on helm + Docker + code, then you simply > to > > "git submodule update", go to the linked "helm" or "docker" folder, > > checkout the "master" version and you start making changes. The only > thing > > to remember when you want to push your changes is to do `git push > > --recurse-sumbodules="check" ` and it will make sure that all the repos > are > > updated, It is a bit involved, but latest git version have a very good > > support and it must only be used by people who work on airflow + docker + > > helm - all the others are unaffected. > > > > From the CI perspective also nothing changes - when we checkout the code > we > > will include submodules and our test harness will be largely unchanged. > > Submodule provides us with the right mechanism for cross dependency even > if > > we use branches. > > > > If everyone will be ok with that - I am happy to set it up, With > submodules > > - we can switch to separate repos even without releasing helm and Prod > > chart "officially". > > > > J. > > > > > > > > > > > > via Newton Mail [ > > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 > > > ] > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <jarek.pot...@polidea.com > > > > > wrote: > > > Sure. We can work with such an approach. There will be some > dependencies > > > that we might find are problematic, but If we all see that it's > > > worth trying, there is a clear benefit that it makes for a "clean" > > > split between those different "entities". And possibly once we release > > > first versions of both image and chart, such problems will be rare and > easy > > > to fix. > > > > > > I personally think such split is inevitable eventually, it's just a > matter > > > when to do it. If we decide to make this happen soon - I am more than > happy > > > to work on making the split reality. > > > > > > One prerequisite to that is that all those - Helm Chart, Prod Image and > > > Airflow are released in stable versions separately "officially" - from > the > > > current sources (otherwise there will be no way to test cross-repo). > > > > > > I think for that we will need to agree on the versioning scheme and > cadence > > > for the Image and Helm Chart, then copy sources from airflow and > release > > > them as "baseline" including setup the tests for all of those - then we > > > can remove both Helm and Dockerfile from the airflow repo. Happy to > help > > > with that if that's the direction we choose as a community. It is > important > > > though that we keep the cross-repo testing working. We have it working > as > > > of yesterday, so now the matter is - whatever we do we keep it running > and > > > have development environment support easy development and testing of > > > either of the three (including CI testing cross-repos) , That's the > only > > > really important thing to me - the rest is more of technicality how we > link > > > the repos, but principle remains. > > > > > > Do we have an idea for the versioning scheme that we would like to use > for > > > the Helm Chart and prod image ? > > > > > > Should we make it CalVer <https://calver.org/overview.html> or SemVer > > > <https://semver.org/> (or some other scheme)? And how should we treat > the > > > combinations with Airflow? > > > > > > My thoughts (but I have no strong opinions as long as someone proposes > more > > > sensible versioning schemes): > > > > > > 1) Airflow code - we continue the release scheme we have (with > deciding on > > > 2.* scheme for the release). I expect in the future we might decide on > > > doing branches or patches so for 2.* I'd opt for going full SemVer > approach > > > and patches released from branches. > > > > > > 2) I believe that Helm Chart can be versioned with its own version > (then > > > you specify the image version as helm parameter). For the Helm Chart I > > > think CalVer might be OK as I do not expect any branching/patches in > the > > > future - I'd expect that there will be a single stream of releases. > > > > > > 3) Dockerfile (+ related files such as .dockerignore, empty dir, > > > entrypoints etc). i do not imagine a lot of branching for those - we > > > should be able to release a new version of a Dockerfile (+ related > files) > > > working with nearly any earlier Airflow release, so CalVer seems like a > > > good choice. > > > > > > 4) Image versioning becomes a bit most complex because the image tag is > > > always combination of: > > > * Dockerfile (+ related files) version > > > * Airflow Version > > > * Python Version > > > > > > An example versioning I can imagine: > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 - patch level (if we > > > decide to have patches). > > > *Dockerfile: *2020.07.12, 2020.08.20...... -> depending when we release > > > them > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm Chart has a > minimum > > > version of both Dockerfile and Airflow versions it works with. > > > > > > *Example Docker Image tags:* > > > apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6 > > > > > > WDYT? > > > > > > J, > > > > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <kaxiln...@gmail.com> > wrote: > > > > > > > I think we should have "separate repos for development" too. > > > > > > > > 3 Repos in total: > > > > > > > > 1) apache/airflow > > > > 2) apache/airflow-docker-image > > > > 3) apache/airflow-helm-chart > > > > > > > > > > > > (1) *apache/airflow* should use a pinned stable version of Airflow > Helm > > > > chart to run Kubernetes tests > > > > (2) *apache/airflow* already has *Dockerfile.ci* file which it can > use to > > > > run airflow tests on docker images. > > > > (3) *apache/airflow-docker-image *should use the latest available > stable > > > > version of airflow > > > > (4) *apache/airflow-helm-chart *should use the latest available > stable > > > > version of airflow > > > > > > > > Having such split also makes some updates more difficult - for > example if > > > > > we add new "extra" to Airflow that will require to install "apt" > > > > dependency > > > > > in Dockerfile, we will have to split it into first adding the > > > dependency > > > > to > > > > > Dockerfile, and once it is merged, we can add the extra to airflow > with > > > > > setup.py. > > > > > > > > > > > > Adding a new extra to setup.py would not (and should not) impact the > > > > development of *apache/airflow-docker-image* > > > > Once an RC is cut for apache/airflow or after a new version is > released > > > for > > > > apache/airflow, we can work on supporting the new airflow version in > the > > > > Production Docker Image. > > > > While doing that we can add all the libraries that are needed by the > new > > > > Airflow Version and we will have a clean commit history and > changelog for > > > > Docker image. > > > > > > > > We definitely do not need to work parallelly on both the repos. By > doing > > > > development in a separate repo we keep consistent "source" files and > we > > > can > > > > release each artifact with a > > > > separate cadence. If someone discovers bug in newly released > Dockerimage, > > > > we should be easily able to cut out a new release with the patch > without > > > > worrying about how development is > > > > going in the apache/airflow repo. > > > > > > > > > > > > *Apache Flink & Apache CoucheDB *does it in the similar manner: > > > > > > > > https://github.com/apache/flink & > https://github.com/apache/flink-docker > > > > https://github.com/apache/couchdb & > > > > https://github.com/apache/couchdb-docker > > > > > > > > Regards, > > > > Kaxil > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk < > jarek.pot...@polidea.com> > > > > wrote: > > > > > > > > > I do not think it's only the question of Mono/Multi repos. While I > > > > clearly > > > > > see the benefit of separate repos I also see some drawbacks. > > > > > > > > > > And if it bothers others, I am happy to follow the majority. If we > > > think > > > > > that a bit more complexity in testing justifies separating those > three > > > > > completely and having more "clean"- it's also workable but IMHO > > > > introduces > > > > > certain complexity in development. > > > > > > > > > > However I think this is not 0/1 a kind of Hybrid approach in my > opinion > > > > > might be best of both worlds - development and releases . > > > > > > > > > > Let me explain what I mean by "Hybrid": > > > > > > > > > > I think we definitely should have separate repositories to release > > > those > > > > > artifacts and I think there is no doubt about it: > > > > > > > > > > * airflow (apache/airflow) > > > > > * prod docker image (apache/airflow-docker) > > > > > * helm chart (apache/airflow-helm) > > > > > * api clients (we already have separate repos for those) > > > > > (apache/airflow-client-*) > > > > > > > > > > I think the only question is where we develop all those (develop != > > > > > release). There are certain benefits of having a single "master" > (let's > > > > > call it "development" further) for all those artifacts. Currently > the > > > > > "development" version for all of those is in one repo - and while > > > > > developing one depends on the other, we also test all of those > together > > > > and > > > > > this means that "current best" set of airflow sources (including > > > > > dependencies in setup.py), Dockerfile and Helm chart work. This > means > > > for > > > > > example that you will not be able to break the Helm Chart by > changing > > > > > anything that the helm chart depends on in airflow. For example if > you > > > > > change "airflow webserver" into "airflow server" the current helm > chart > > > > > will break. Similarly if you change entrypoint,sh in Docker image > in a > > > > way > > > > > that is not compatible with Helm chart, we will not let that > happen - > > > the > > > > > CI tests will break if either of those changes in an incompatible > way. > > > > And > > > > > we can have dependencies in any direction between those three. > When we > > > > see > > > > > a commit break either of the three - we can make a decision about > what > > > to > > > > > do - either accept and document the incompatibility or fix it. > > > > > > > > > > Of course keeping that property (testing it all together) is also > > > > possible > > > > > if they are in completely separate repos. There are several > > > > > cross-dependencies - Docker image building depends on dependencies > in > > > > > setup.py for example, you cannot build Docker image from only > > > Dockerfile > > > > > without the sources of airflow nor build and test helm charts > without > > > the > > > > > image (and sources - because that's where the current kubernetes > tests > > > > > are). If we want to continue doing it for both Helm and > Dockerfile, we > > > > > would have to basically check out the latest sources of Airflow > and run > > > > the > > > > > CI tests before merging any Docker or Helm Chart changes and the > > > > opposite - > > > > > we will have to download Dockerfile/Helm chart and build > image/install > > > > Helm > > > > > chart when we are running CI tests for Airflow. This is possible > and we > > > > > could do it, but it adds complexity to the build/CI process. > > > > > > > > > > Having such split also makes some updates more difficult - for > example > > > if > > > > > we add new "extra" to Airflow that will require to install "apt" > > > > dependency > > > > > in Dockerfile, we will have to split it into first adding the > > > dependency > > > > to > > > > > Dockerfile, and once it is merged, we can add the extra to airflow > with > > > > > setup.py. This makes it quite difficult to test it together though > (the > > > > > Dockerfile change can only be tested fully after merging it to > master). > > > > Not > > > > > mentioning complexity of managing different versions - your local > > > > > development Dockerfile version vs sources of Airflow for example. > > > Imagine > > > > > switching between branches where you add two different apt > dependencies > > > > to > > > > > the Dockerfile. There are more similar scenarios I can imagine - > > > > especially > > > > > for parallel changes in those repos. > > > > > > > > > > This is of course doable to keep them separate, but it is quite a > bit > > > > more > > > > > complex to set up (especially for a consistent development > environment) > > > > > when you have separate repos and prevent cross-breaking changes > might > > > be > > > > > more difficult. > > > > > > > > > > I believe that the best way is to continue developing airflow + > image + > > > > > chart in one repo - airflow, but release them from those separate > > > repos. > > > > > > > > > > Airflow source release does not have to contain neither chart, nor > > > image. > > > > > And even if it contains sources for those, they are not the final > > > > > "artifacts" (installable image and installable helm chart). > > > > > Whenever we decide to release either of them - we test it in > > > > "development". > > > > > Then only when it is tested, we copy the sources to those separate > > > repos > > > > > and release them. > > > > > > > > > > With git - we can even do it very easily while preserving history > of > > > > > commits easily (been there, done that). And then we could release > Helm > > > > and > > > > > Docker image separately based on the commits and tags in those > separate > > > > > repositories. > > > > > > > > > > I agree that separate repos is a more "clean" approach. But I > think it > > > is > > > > > less convenient for development consistency. > > > > > > > > > > J, > > > > > > > > > > > > > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <kaxiln...@gmail.com> > wrote: > > > > > > > > > > > Forgot to mention, having them in separate repo also helps in > better > > > > > > managing each individual artifacts. > > > > > > > > > > > > Each repo would have a separate Github Issue where we can track > the > > > > issue > > > > > > specific to Helm chart or Dockerfile. > > > > > > > > > > > > Regards, > > > > > > Kaxil > > > > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <kaxiln...@gmail.com> > > > wrote: > > > > > > > > > > > > > The PMC also needs to agree if we want separate VOTING for > Docker > > > > Image > > > > > > > and Helm chart, I think we do. > > > > > > > > > > > > > > Regards, > > > > > > > Kaxil > > > > > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <kaxiln...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > >> Hi all, > > > > > > >> > > > > > > >> What do you all think about having Dockerfile and Helm chart > in > > > the > > > > > same > > > > > > >> "Airflow" Repo vs separate? > > > > > > >> > > > > > > >> I feel having a separate repo for Airflow Dockerfile and Helm > > > chart > > > > > have > > > > > > >> more benefits like easy to track changes (via Changelog), > easy for > > > > new > > > > > > >> contributors, separate release cadence. > > > > > > >> > > > > > > >> Currently, docker file and Helm Chart are inside the same > repo and > > > > > when > > > > > > >> we release changelog for a new Airflow version, it would > include > > > all > > > > > > >> changes (Airflow + Dockerfile + Helm chart) which I think is > not > > > > that > > > > > > great. > > > > > > >> > > > > > > >> Also having them all inside a single repo means changes in > Helm > > > > Chart > > > > > > and > > > > > > >> Dockerfile can block Airflow release. We could use stable Helm > > > Chart > > > > > > >> version and Dockerfile version to test Airflow so that they > are > > > > > > blockers to > > > > > > >> release too. > > > > > > >> > > > > > > >> Happy to hear the thoughts from the community. > > > > > > >> > > > > > > >> Regards, > > > > > > >> Kaxil > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Jarek Potiuk > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > > > > > > > > > -- > > > > > > Jarek Potiuk > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > M: +48 660 796 129 <+48660796129> > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>