Sure. We can work with such an approach. There will be some dependencies
that we might find are problematic, but If we all see that it's
worth trying, there is a clear benefit that it makes for a "clean"
split between those different "entities". And possibly once we release
first versions of both image and chart, such problems will be rare and easy
to fix.

I personally think such split is inevitable eventually, it's just a matter
when to do it. If we decide to make this happen soon - I am more than happy
to work on making the split reality.

One prerequisite to that is that all those - Helm Chart, Prod Image and
Airflow are released in stable versions separately "officially" - from the
current sources (otherwise there will be no way to test cross-repo).

I think for that we will need to agree on the versioning scheme and cadence
for the Image and Helm Chart, then copy sources from airflow and release
them  as "baseline" including setup the tests for all of those - then we
can remove both Helm and Dockerfile from the airflow repo. Happy to help
with that if that's the direction we choose as a community. It is important
though that we keep the cross-repo testing working. We have it working as
of yesterday, so now the matter is - whatever we do we keep it running and
have development environment support easy development and testing  of
either of the three (including CI testing cross-repos) , That's the only
really important thing to me - the rest is more of technicality how we link
the repos, but principle remains.

Do we have an idea for the versioning scheme that we would like to use for
the Helm Chart and prod image ?

Should we make it CalVer <https://calver.org/overview.html> or SemVer
<https://semver.org/> (or some other scheme)?  And how should we treat the
combinations with Airflow?

My thoughts (but I have no strong opinions as long as someone proposes more
sensible versioning schemes):

1) Airflow code - we continue the release scheme we have (with deciding on
2.* scheme for the release). I expect in the future we might decide on
doing branches or patches so for 2.* I'd opt for going full SemVer approach
and patches released from branches.

2) I believe that Helm Chart can be versioned with its own version (then
you specify the image version as helm parameter). For the Helm Chart I
think CalVer might be OK as I do not expect any branching/patches in the
future - I'd expect that there will be a single stream of releases.

3) Dockerfile (+ related files such as .dockerignore, empty dir,
entrypoints etc).  i do not imagine a lot of branching for those - we
should be able to release a new version of a Dockerfile (+ related files)
working with nearly any earlier Airflow release, so CalVer seems like a
good choice.

4) Image versioning becomes a bit most complex because the image tag is
always combination of:
* Dockerfile (+ related files) version
* Airflow Version
* Python Version

An example versioning I can imagine:

*Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 - patch level (if we
decide to have patches).
*Dockerfile: *2020.07.12, 2020.08.20...... -> depending when we release them
*Helm Chart*: 2020.07.10, 2020.08.09 ......  Each Helm Chart has a minimum
version of both Dockerfile and Airflow versions it works with.

*Example Docker Image tags:*
 apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6

WDYT?

J,


On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <kaxiln...@gmail.com> wrote:

> I think we should have "separate repos for development" too.
>
> 3 Repos in total:
>
> 1) apache/airflow
> 2) apache/airflow-docker-image
> 3) apache/airflow-helm-chart
>
>
> (1) *apache/airflow* should use a pinned stable version of Airflow Helm
> chart to run Kubernetes tests
> (2) *apache/airflow* already has *Dockerfile.ci* file which it can use to
> run airflow tests on docker images.
> (3) *apache/airflow-docker-image *should use the latest available stable
> version of airflow
> (4) *apache/airflow-helm-chart *should use the latest available stable
> version of airflow
>
> Having such split also makes some updates more difficult - for example if
> > we add new "extra" to Airflow that will require to install "apt"
> dependency
> > in Dockerfile, we will have to split it into first adding the dependency
> to
> > Dockerfile, and once it is merged, we can add the extra to airflow with
> > setup.py.
>
>
> Adding a new extra to setup.py would not (and should not) impact the
> development of *apache/airflow-docker-image*
> Once an RC is cut for apache/airflow or after a new version is released for
> apache/airflow, we can work on supporting the new airflow version in the
> Production Docker Image.
> While doing that we can add all the libraries that are needed by the new
> Airflow Version and we will have a clean commit history and changelog for
> Docker image.
>
> We definitely do not need to work parallelly on both the repos. By doing
> development in a separate repo we keep consistent "source" files and we can
> release each artifact with a
> separate cadence. If someone discovers bug in newly released Dockerimage,
> we should be easily able to cut out a new release with the patch without
> worrying about how development is
> going in the apache/airflow repo.
>
>
> *Apache Flink & Apache CoucheDB *does it in the similar manner:
>
> https://github.com/apache/flink & https://github.com/apache/flink-docker
> https://github.com/apache/couchdb &
> https://github.com/apache/couchdb-docker
>
> Regards,
> Kaxil
>
>
>
>
>
>
> On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
> > I do not think it's only the question of Mono/Multi repos. While I
> clearly
> > see the benefit of separate repos I also see some drawbacks.
> >
> > And if it bothers others, I am happy to follow the majority. If we think
> > that a bit more complexity in testing justifies separating those three
> > completely and having more "clean"- it's also workable but IMHO
> introduces
> > certain complexity in development.
> >
> > However I think this is not 0/1 a kind of Hybrid approach in my opinion
> > might be best of both worlds - development and releases .
> >
> > Let me explain what I mean by "Hybrid":
> >
> > I think we definitely should have separate repositories to release those
> > artifacts and I think there is no doubt about it:
> >
> > * airflow (apache/airflow)
> > * prod docker image (apache/airflow-docker)
> > * helm chart (apache/airflow-helm)
> > * api clients (we already have separate repos for those)
> > (apache/airflow-client-*)
> >
> > I think the only question is where we develop all those (develop !=
> > release). There are certain benefits of having a single "master" (let's
> > call it "development" further) for all those artifacts. Currently the
> > "development" version for all of those is in one repo - and while
> > developing one depends on the other, we also test all of those together
> and
> > this means that "current best" set of airflow sources (including
> > dependencies in setup.py), Dockerfile and Helm chart work. This means for
> > example that you will not be able to break the Helm Chart by changing
> > anything that the helm chart depends on in airflow. For example if you
> > change "airflow webserver" into "airflow server" the current helm chart
> > will break. Similarly if you change entrypoint,sh in Docker image in a
> way
> > that is not compatible with Helm chart, we will not let that happen - the
> > CI tests will break if either of those changes in an incompatible way.
> And
> > we can have dependencies in any direction between those three. When we
> see
> > a commit break either of the three - we can make a decision about what to
> > do - either accept and document the incompatibility or fix it.
> >
> > Of course keeping that property (testing it all together) is also
> possible
> > if they are in completely separate repos. There are several
> > cross-dependencies - Docker image building depends on dependencies in
> > setup.py for example, you cannot build Docker image from only Dockerfile
> > without the sources of airflow nor build and test helm charts without the
> > image (and sources - because that's where the current kubernetes tests
> > are). If we want to continue doing it for both Helm and Dockerfile, we
> > would have to basically check out the latest sources of Airflow and run
> the
> > CI tests before merging any Docker or Helm Chart changes and the
> opposite -
> > we will have to download Dockerfile/Helm chart and build image/install
> Helm
> > chart when we are running CI tests for Airflow. This is possible and we
> > could do it, but it adds complexity to the build/CI process.
> >
> > Having such split also makes some updates more difficult - for example if
> > we add new "extra" to Airflow that will require to install "apt"
> dependency
> > in Dockerfile, we will have to split it into first adding the dependency
> to
> > Dockerfile, and once it is merged, we can add the extra to airflow with
> > setup.py. This makes it quite difficult to test it together though (the
> > Dockerfile change can only be tested fully after merging it to master).
> Not
> > mentioning complexity of managing different versions - your local
> > development Dockerfile version vs sources of Airflow for example. Imagine
> > switching between branches where you add two different apt dependencies
> to
> > the Dockerfile. There are more similar scenarios I can imagine -
> especially
> > for parallel changes in those repos.
> >
> > This is of course doable to keep them separate, but it is quite a bit
> more
> > complex to set up (especially for a consistent development environment)
> > when you have separate repos and prevent cross-breaking changes might be
> > more difficult.
> >
> > I believe that the best way is to continue developing airflow + image +
> > chart in one repo - airflow, but release them from those separate repos.
> >
> > Airflow source release does not have to contain neither chart, nor image.
> > And even if it contains sources for those, they are not the final
> > "artifacts" (installable image and installable helm chart).
> > Whenever we decide to release either of them - we test it in
> "development".
> > Then only when it is tested, we copy the sources to those separate repos
> > and release them.
> >
> > With git -  we can even do it very easily while preserving history of
> > commits easily (been there, done that). And then we could release Helm
> and
> > Docker image separately based on the commits and tags in those separate
> > repositories.
> >
> > I agree that separate repos is a more "clean" approach. But I think it is
> > less convenient for development consistency.
> >
> > J,
> >
> >
> >
> > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Forgot to mention, having them in separate repo also helps in better
> > > managing each individual artifacts.
> > >
> > > Each repo would have a separate Github Issue where we can track the
> issue
> > > specific to Helm chart or Dockerfile.
> > >
> > > Regards,
> > > Kaxil
> > >
> > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> > >
> > > > The PMC also needs to agree if we want separate VOTING for Docker
> Image
> > > > and Helm chart, I think we do.
> > > >
> > > > Regards,
> > > > Kaxil
> > > >
> > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <kaxiln...@gmail.com>
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> What do you all think about having Dockerfile and Helm chart in the
> > same
> > > >> "Airflow" Repo vs separate?
> > > >>
> > > >> I feel having a separate repo for Airflow Dockerfile and Helm chart
> > have
> > > >> more benefits like easy to track changes (via Changelog), easy for
> new
> > > >> contributors, separate release cadence.
> > > >>
> > > >> Currently, docker file and Helm Chart are inside the same repo and
> > when
> > > >> we release changelog for a new Airflow version, it would include all
> > > >> changes (Airflow + Dockerfile + Helm chart) which I think is not
> that
> > > great.
> > > >>
> > > >> Also having them all inside a single repo means changes in Helm
> Chart
> > > and
> > > >> Dockerfile can block Airflow release. We could use stable Helm Chart
> > > >> version and Dockerfile version to test Airflow so that they are
> > > blockers to
> > > >> release too.
> > > >>
> > > >> Happy to hear the thoughts from the community.
> > > >>
> > > >> Regards,
> > > >> Kaxil
> > > >>
> > > >
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to