Absolutely - I am happy to add "best practices" and short "howto do stuff
with git submodules"  - and this knowledge will only be needed for
interacting with prod image/helmchart/running kubernetes tests. For all the
other purposes it should be "business as usual".

On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <daniel.imber...@gmail.com>
wrote:

> I think git submodules sounds like a great idea. We would need to write
> this into the CONTRIBUTING.md to let people know how to do it but It’s a
> “teach once” situation.
>
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> ]
> On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <turbas...@apache.org>
> wrote:
> I support the idea of separate repos. The git submodules mentioned by
> Jarek sounds like an interesting solution. It may add some complexity
> for new contributors but it's not rocket science. If we agree on using
> this we should add small how-to in contributing.rst I think (i.e. do I
> have to have fork of each repo?).
>
> As stressed previously if we go this route we should make sure we have
> nice testing of all those three components. Regarding the versioning,
> I have no strong opinion but I fully support using separate issues for
> airflow, docker, and helm.
>
> Tomek
>
>
> On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> >
> > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
> daniel.imber...@gmail.com>
> > wrote:
> >
> > I’m fine with keeping it as three separate repos but merging testing
> > > somehow (e.g. the source code chart would pull the helm/docker chart
> into
> > > .build) but we need to do it in a way that doesn’t make testing too
> > > difficult.
> > >
> > > So for example: How do I test/integration test a change that involves a
> > > change to all three and has to be done at the same time? Perhaps a
> user can
> > > “register” a branch of helm and docker when they start up breeze? Or
> > > perhaps we create a “parent” integration test that uses the three
> together?
> > >
> >
> > Yes, those are exactly my concerns when splitting the repos.
> >
> > I think testing for development should remain in the "airflow" repo. It
> is
> > the "central one" in fact. I slept it over and I think using "released"
> > versions for development testing will suffer from this "we need a change
> in
> > all three of those".
> >
> > But we have an easy solution I think.
> >
> > I think that simply setting submodules properly should do to the job:
> > https://git-scm.com/book/en/v2/Git-Tools-Submodules. They seem to be
> > perfect for our case.
> >
> > For those who have not used it - in short - submodules work in the way
> that
> > they register the "linked repos" and store related "hash" of the commit
> > from that linked repo. For example, the "chart" folder will be a link to
> > "apache/airflow-helm-chart". We can also move the prod Dockerfile to a
> > subfolder and link it to the separate repo. Git submodule has a
> > built-in mechanism to a) update to the latest version of the repo, b)
> > commit your changes to the linked repo from there which is all we need. I
> > used those few times - I never liked submodules for sharing "library"
> code,
> > but for sharing helm/Docker It seems perfect.
> >
> > From the "regular" developer point of view - you do not need to
> get/update
> > submodules if you do not need to use them - so for all the development
> > purposes if you only change the "airflow" code, you would not even need
> to
> > sync chart or Dockerfile. You do "git checkout" as usual and it should
> > work. So basically - no change for "regular" airflow development.
> >
> > However, if you do need to work on helm + Docker + code, then you simply
> to
> > "git submodule update", go to the linked "helm" or "docker" folder,
> > checkout the "master" version and you start making changes. The only
> thing
> > to remember when you want to push your changes is to do `git push
> > --recurse-sumbodules="check" ` and it will make sure that all the repos
> are
> > updated, It is a bit involved, but latest git version have a very good
> > support and it must only be used by people who work on airflow + docker +
> > helm - all the others are unaffected.
> >
> > From the CI perspective also nothing changes - when we checkout the code
> we
> > will include submodules and our test harness will be largely unchanged.
> > Submodule provides us with the right mechanism for cross dependency even
> if
> > we use branches.
> >
> > If everyone will be ok with that - I am happy to set it up, With
> submodules
> > - we can switch to separate repos even without releasing helm and Prod
> > chart "officially".
> >
> > J.
> >
> >
> >
> > >
> > > via Newton Mail [
> > >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > ]
> > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <jarek.pot...@polidea.com
> >
> > > wrote:
> > > Sure. We can work with such an approach. There will be some
> dependencies
> > > that we might find are problematic, but If we all see that it's
> > > worth trying, there is a clear benefit that it makes for a "clean"
> > > split between those different "entities". And possibly once we release
> > > first versions of both image and chart, such problems will be rare and
> easy
> > > to fix.
> > >
> > > I personally think such split is inevitable eventually, it's just a
> matter
> > > when to do it. If we decide to make this happen soon - I am more than
> happy
> > > to work on making the split reality.
> > >
> > > One prerequisite to that is that all those - Helm Chart, Prod Image and
> > > Airflow are released in stable versions separately "officially" - from
> the
> > > current sources (otherwise there will be no way to test cross-repo).
> > >
> > > I think for that we will need to agree on the versioning scheme and
> cadence
> > > for the Image and Helm Chart, then copy sources from airflow and
> release
> > > them as "baseline" including setup the tests for all of those - then we
> > > can remove both Helm and Dockerfile from the airflow repo. Happy to
> help
> > > with that if that's the direction we choose as a community. It is
> important
> > > though that we keep the cross-repo testing working. We have it working
> as
> > > of yesterday, so now the matter is - whatever we do we keep it running
> and
> > > have development environment support easy development and testing of
> > > either of the three (including CI testing cross-repos) , That's the
> only
> > > really important thing to me - the rest is more of technicality how we
> link
> > > the repos, but principle remains.
> > >
> > > Do we have an idea for the versioning scheme that we would like to use
> for
> > > the Helm Chart and prod image ?
> > >
> > > Should we make it CalVer <https://calver.org/overview.html> or SemVer
> > > <https://semver.org/> (or some other scheme)? And how should we treat
> the
> > > combinations with Airflow?
> > >
> > > My thoughts (but I have no strong opinions as long as someone proposes
> more
> > > sensible versioning schemes):
> > >
> > > 1) Airflow code - we continue the release scheme we have (with
> deciding on
> > > 2.* scheme for the release). I expect in the future we might decide on
> > > doing branches or patches so for 2.* I'd opt for going full SemVer
> approach
> > > and patches released from branches.
> > >
> > > 2) I believe that Helm Chart can be versioned with its own version
> (then
> > > you specify the image version as helm parameter). For the Helm Chart I
> > > think CalVer might be OK as I do not expect any branching/patches in
> the
> > > future - I'd expect that there will be a single stream of releases.
> > >
> > > 3) Dockerfile (+ related files such as .dockerignore, empty dir,
> > > entrypoints etc). i do not imagine a lot of branching for those - we
> > > should be able to release a new version of a Dockerfile (+ related
> files)
> > > working with nearly any earlier Airflow release, so CalVer seems like a
> > > good choice.
> > >
> > > 4) Image versioning becomes a bit most complex because the image tag is
> > > always combination of:
> > > * Dockerfile (+ related files) version
> > > * Airflow Version
> > > * Python Version
> > >
> > > An example versioning I can imagine:
> > >
> > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 - patch level (if we
> > > decide to have patches).
> > > *Dockerfile: *2020.07.12, 2020.08.20...... -> depending when we release
> > > them
> > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm Chart has a
> minimum
> > > version of both Dockerfile and Airflow versions it works with.
> > >
> > > *Example Docker Image tags:*
> > > apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
> > >
> > > WDYT?
> > >
> > > J,
> > >
> > >
> > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <kaxiln...@gmail.com>
> wrote:
> > >
> > > > I think we should have "separate repos for development" too.
> > > >
> > > > 3 Repos in total:
> > > >
> > > > 1) apache/airflow
> > > > 2) apache/airflow-docker-image
> > > > 3) apache/airflow-helm-chart
> > > >
> > > >
> > > > (1) *apache/airflow* should use a pinned stable version of Airflow
> Helm
> > > > chart to run Kubernetes tests
> > > > (2) *apache/airflow* already has *Dockerfile.ci* file which it can
> use to
> > > > run airflow tests on docker images.
> > > > (3) *apache/airflow-docker-image *should use the latest available
> stable
> > > > version of airflow
> > > > (4) *apache/airflow-helm-chart *should use the latest available
> stable
> > > > version of airflow
> > > >
> > > > Having such split also makes some updates more difficult - for
> example if
> > > > > we add new "extra" to Airflow that will require to install "apt"
> > > > dependency
> > > > > in Dockerfile, we will have to split it into first adding the
> > > dependency
> > > > to
> > > > > Dockerfile, and once it is merged, we can add the extra to airflow
> with
> > > > > setup.py.
> > > >
> > > >
> > > > Adding a new extra to setup.py would not (and should not) impact the
> > > > development of *apache/airflow-docker-image*
> > > > Once an RC is cut for apache/airflow or after a new version is
> released
> > > for
> > > > apache/airflow, we can work on supporting the new airflow version in
> the
> > > > Production Docker Image.
> > > > While doing that we can add all the libraries that are needed by the
> new
> > > > Airflow Version and we will have a clean commit history and
> changelog for
> > > > Docker image.
> > > >
> > > > We definitely do not need to work parallelly on both the repos. By
> doing
> > > > development in a separate repo we keep consistent "source" files and
> we
> > > can
> > > > release each artifact with a
> > > > separate cadence. If someone discovers bug in newly released
> Dockerimage,
> > > > we should be easily able to cut out a new release with the patch
> without
> > > > worrying about how development is
> > > > going in the apache/airflow repo.
> > > >
> > > >
> > > > *Apache Flink & Apache CoucheDB *does it in the similar manner:
> > > >
> > > > https://github.com/apache/flink &
> https://github.com/apache/flink-docker
> > > > https://github.com/apache/couchdb &
> > > > https://github.com/apache/couchdb-docker
> > > >
> > > > Regards,
> > > > Kaxil
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
> jarek.pot...@polidea.com>
> > > > wrote:
> > > >
> > > > > I do not think it's only the question of Mono/Multi repos. While I
> > > > clearly
> > > > > see the benefit of separate repos I also see some drawbacks.
> > > > >
> > > > > And if it bothers others, I am happy to follow the majority. If we
> > > think
> > > > > that a bit more complexity in testing justifies separating those
> three
> > > > > completely and having more "clean"- it's also workable but IMHO
> > > > introduces
> > > > > certain complexity in development.
> > > > >
> > > > > However I think this is not 0/1 a kind of Hybrid approach in my
> opinion
> > > > > might be best of both worlds - development and releases .
> > > > >
> > > > > Let me explain what I mean by "Hybrid":
> > > > >
> > > > > I think we definitely should have separate repositories to release
> > > those
> > > > > artifacts and I think there is no doubt about it:
> > > > >
> > > > > * airflow (apache/airflow)
> > > > > * prod docker image (apache/airflow-docker)
> > > > > * helm chart (apache/airflow-helm)
> > > > > * api clients (we already have separate repos for those)
> > > > > (apache/airflow-client-*)
> > > > >
> > > > > I think the only question is where we develop all those (develop !=
> > > > > release). There are certain benefits of having a single "master"
> (let's
> > > > > call it "development" further) for all those artifacts. Currently
> the
> > > > > "development" version for all of those is in one repo - and while
> > > > > developing one depends on the other, we also test all of those
> together
> > > > and
> > > > > this means that "current best" set of airflow sources (including
> > > > > dependencies in setup.py), Dockerfile and Helm chart work. This
> means
> > > for
> > > > > example that you will not be able to break the Helm Chart by
> changing
> > > > > anything that the helm chart depends on in airflow. For example if
> you
> > > > > change "airflow webserver" into "airflow server" the current helm
> chart
> > > > > will break. Similarly if you change entrypoint,sh in Docker image
> in a
> > > > way
> > > > > that is not compatible with Helm chart, we will not let that
> happen -
> > > the
> > > > > CI tests will break if either of those changes in an incompatible
> way.
> > > > And
> > > > > we can have dependencies in any direction between those three.
> When we
> > > > see
> > > > > a commit break either of the three - we can make a decision about
> what
> > > to
> > > > > do - either accept and document the incompatibility or fix it.
> > > > >
> > > > > Of course keeping that property (testing it all together) is also
> > > > possible
> > > > > if they are in completely separate repos. There are several
> > > > > cross-dependencies - Docker image building depends on dependencies
> in
> > > > > setup.py for example, you cannot build Docker image from only
> > > Dockerfile
> > > > > without the sources of airflow nor build and test helm charts
> without
> > > the
> > > > > image (and sources - because that's where the current kubernetes
> tests
> > > > > are). If we want to continue doing it for both Helm and
> Dockerfile, we
> > > > > would have to basically check out the latest sources of Airflow
> and run
> > > > the
> > > > > CI tests before merging any Docker or Helm Chart changes and the
> > > > opposite -
> > > > > we will have to download Dockerfile/Helm chart and build
> image/install
> > > > Helm
> > > > > chart when we are running CI tests for Airflow. This is possible
> and we
> > > > > could do it, but it adds complexity to the build/CI process.
> > > > >
> > > > > Having such split also makes some updates more difficult - for
> example
> > > if
> > > > > we add new "extra" to Airflow that will require to install "apt"
> > > > dependency
> > > > > in Dockerfile, we will have to split it into first adding the
> > > dependency
> > > > to
> > > > > Dockerfile, and once it is merged, we can add the extra to airflow
> with
> > > > > setup.py. This makes it quite difficult to test it together though
> (the
> > > > > Dockerfile change can only be tested fully after merging it to
> master).
> > > > Not
> > > > > mentioning complexity of managing different versions - your local
> > > > > development Dockerfile version vs sources of Airflow for example.
> > > Imagine
> > > > > switching between branches where you add two different apt
> dependencies
> > > > to
> > > > > the Dockerfile. There are more similar scenarios I can imagine -
> > > > especially
> > > > > for parallel changes in those repos.
> > > > >
> > > > > This is of course doable to keep them separate, but it is quite a
> bit
> > > > more
> > > > > complex to set up (especially for a consistent development
> environment)
> > > > > when you have separate repos and prevent cross-breaking changes
> might
> > > be
> > > > > more difficult.
> > > > >
> > > > > I believe that the best way is to continue developing airflow +
> image +
> > > > > chart in one repo - airflow, but release them from those separate
> > > repos.
> > > > >
> > > > > Airflow source release does not have to contain neither chart, nor
> > > image.
> > > > > And even if it contains sources for those, they are not the final
> > > > > "artifacts" (installable image and installable helm chart).
> > > > > Whenever we decide to release either of them - we test it in
> > > > "development".
> > > > > Then only when it is tested, we copy the sources to those separate
> > > repos
> > > > > and release them.
> > > > >
> > > > > With git - we can even do it very easily while preserving history
> of
> > > > > commits easily (been there, done that). And then we could release
> Helm
> > > > and
> > > > > Docker image separately based on the commits and tags in those
> separate
> > > > > repositories.
> > > > >
> > > > > I agree that separate repos is a more "clean" approach. But I
> think it
> > > is
> > > > > less convenient for development consistency.
> > > > >
> > > > > J,
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <kaxiln...@gmail.com>
> wrote:
> > > > >
> > > > > > Forgot to mention, having them in separate repo also helps in
> better
> > > > > > managing each individual artifacts.
> > > > > >
> > > > > > Each repo would have a separate Github Issue where we can track
> the
> > > > issue
> > > > > > specific to Helm chart or Dockerfile.
> > > > > >
> > > > > > Regards,
> > > > > > Kaxil
> > > > > >
> > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > The PMC also needs to agree if we want separate VOTING for
> Docker
> > > > Image
> > > > > > > and Helm chart, I think we do.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Kaxil
> > > > > > >
> > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <kaxiln...@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> What do you all think about having Dockerfile and Helm chart
> in
> > > the
> > > > > same
> > > > > > >> "Airflow" Repo vs separate?
> > > > > > >>
> > > > > > >> I feel having a separate repo for Airflow Dockerfile and Helm
> > > chart
> > > > > have
> > > > > > >> more benefits like easy to track changes (via Changelog),
> easy for
> > > > new
> > > > > > >> contributors, separate release cadence.
> > > > > > >>
> > > > > > >> Currently, docker file and Helm Chart are inside the same
> repo and
> > > > > when
> > > > > > >> we release changelog for a new Airflow version, it would
> include
> > > all
> > > > > > >> changes (Airflow + Dockerfile + Helm chart) which I think is
> not
> > > > that
> > > > > > great.
> > > > > > >>
> > > > > > >> Also having them all inside a single repo means changes in
> Helm
> > > > Chart
> > > > > > and
> > > > > > >> Dockerfile can block Airflow release. We could use stable Helm
> > > Chart
> > > > > > >> version and Dockerfile version to test Airflow so that they
> are
> > > > > > blockers to
> > > > > > >> release too.
> > > > > > >>
> > > > > > >> Happy to hear the thoughts from the community.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Kaxil
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > > M: +48 660 796 129 <+48660796129>
> > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>



-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to