I do not think it's only the question of Mono/Multi repos. While I clearly
see the benefit of separate repos I also see some drawbacks.

And if it bothers others, I am happy to follow the majority. If we think
that a bit more complexity in testing justifies separating those three
completely and having more "clean"- it's also workable but IMHO introduces
certain complexity in development.

However I think this is not 0/1 a kind of Hybrid approach in my opinion
might be best of both worlds - development and releases .

Let me explain what I mean by "Hybrid":

I think we definitely should have separate repositories to release those
artifacts and I think there is no doubt about it:

* airflow (apache/airflow)
* prod docker image (apache/airflow-docker)
* helm chart (apache/airflow-helm)
* api clients (we already have separate repos for those)
(apache/airflow-client-*)

I think the only question is where we develop all those (develop !=
release). There are certain benefits of having a single "master" (let's
call it "development" further) for all those artifacts. Currently the
"development" version for all of those is in one repo - and while
developing one depends on the other, we also test all of those together and
this means that "current best" set of airflow sources (including
dependencies in setup.py), Dockerfile and Helm chart work. This means for
example that you will not be able to break the Helm Chart by changing
anything that the helm chart depends on in airflow. For example if you
change "airflow webserver" into "airflow server" the current helm chart
will break. Similarly if you change entrypoint,sh in Docker image in a way
that is not compatible with Helm chart, we will not let that happen - the
CI tests will break if either of those changes in an incompatible way. And
we can have dependencies in any direction between those three. When we see
a commit break either of the three - we can make a decision about what to
do - either accept and document the incompatibility or fix it.

Of course keeping that property (testing it all together) is also possible
if they are in completely separate repos. There are several
cross-dependencies - Docker image building depends on dependencies in
setup.py for example, you cannot build Docker image from only Dockerfile
without the sources of airflow nor build and test helm charts without the
image (and sources - because that's where the current kubernetes tests
are). If we want to continue doing it for both Helm and Dockerfile, we
would have to basically check out the latest sources of Airflow and run the
CI tests before merging any Docker or Helm Chart changes and the opposite -
we will have to download Dockerfile/Helm chart and build image/install Helm
chart when we are running CI tests for Airflow. This is possible and we
could do it, but it adds complexity to the build/CI process.

Having such split also makes some updates more difficult - for example if
we add new "extra" to Airflow that will require to install "apt" dependency
in Dockerfile, we will have to split it into first adding the dependency to
Dockerfile, and once it is merged, we can add the extra to airflow with
setup.py. This makes it quite difficult to test it together though (the
Dockerfile change can only be tested fully after merging it to master). Not
mentioning complexity of managing different versions - your local
development Dockerfile version vs sources of Airflow for example. Imagine
switching between branches where you add two different apt dependencies to
the Dockerfile. There are more similar scenarios I can imagine - especially
for parallel changes in those repos.

This is of course doable to keep them separate, but it is quite a bit more
complex to set up (especially for a consistent development environment)
when you have separate repos and prevent cross-breaking changes might be
more difficult.

I believe that the best way is to continue developing airflow + image +
chart in one repo - airflow, but release them from those separate repos.

Airflow source release does not have to contain neither chart, nor image.
And even if it contains sources for those, they are not the final
"artifacts" (installable image and installable helm chart).
Whenever we decide to release either of them - we test it in "development".
Then only when it is tested, we copy the sources to those separate repos
and release them.

With git -  we can even do it very easily while preserving history of
commits easily (been there, done that). And then we could release Helm and
Docker image separately based on the commits and tags in those separate
repositories.

I agree that separate repos is a more "clean" approach. But I think it is
less convenient for development consistency.

J,



On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Forgot to mention, having them in separate repo also helps in better
> managing each individual artifacts.
>
> Each repo would have a separate Github Issue where we can track the issue
> specific to Helm chart or Dockerfile.
>
> Regards,
> Kaxil
>
> On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
> > The PMC also needs to agree if we want separate VOTING for Docker Image
> > and Helm chart, I think we do.
> >
> > Regards,
> > Kaxil
> >
> > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> What do you all think about having Dockerfile and Helm chart in the same
> >> "Airflow" Repo vs separate?
> >>
> >> I feel having a separate repo for Airflow Dockerfile and Helm chart have
> >> more benefits like easy to track changes (via Changelog), easy for new
> >> contributors, separate release cadence.
> >>
> >> Currently, docker file and Helm Chart are inside the same repo and when
> >> we release changelog for a new Airflow version, it would include all
> >> changes (Airflow + Dockerfile + Helm chart) which I think is not that
> great.
> >>
> >> Also having them all inside a single repo means changes in Helm Chart
> and
> >> Dockerfile can block Airflow release. We could use stable Helm Chart
> >> version and Dockerfile version to test Airflow so that they are
> blockers to
> >> release too.
> >>
> >> Happy to hear the thoughts from the community.
> >>
> >> Regards,
> >> Kaxil
> >>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to