Hi all,

Ben Cooksley and I would like to get some feedback on further evolutions to 
the organization structure we employ for the repositories at git.kde.org, to 
allow our current usage of CI even as we move farther into the KF5-based 
world.

TL;DR: More indirection in our JSON in kde-build-metadata, not a lot of end-
user visible change, new org. terms: "Division" and "Track" for multi-repo 
organization, tracking inter-repo dependencies would change too (sayonara 
dependency-data-$branch_group), less CI servers turned into melting piles of 
slag. +1?

The proposal follows for those who like reading excessively wordy text.

Regards,
 - Michael Pyne

Improving KDE Project Organization: A Proposal
==============================================

18 Aug 2014

Michael Pyne <mp...@kde.org> and Ben Cooksley <bcooks...@kde.org>

This is a proposal to evolve the current method of organizing our mass of KDE
source code repositories, and their dependencies, as contained in the
`kde-build-metadata` repository and used by kdesrc-build and build.kde.org
(referred to as "CI"). This is needed in order to correct some deficiencies in
the [current
specification](https://community.kde.org/Infrastructure/Project_Metadata), and
to help better support changing trends in developer workflow.

Current Situation
=================

If you're familiar with the current organization of "KDE build metadata" you
should skip to the next section.

Currently, the git-based source code repositories that make up KDE.org's
software releases are each given a "project path" that fully specifies the 
name
of the module in a virtual hierarchy. For instance, kdesrc-build itself is
really "extragear/utils/kdesrc-build", and KDE 4's kdelibs is "kde/kdelibs".

Since many modules support KDE4 and Qt5/KF5 (or may in the future), some
developers associated with KDE source code repositories introduced the "branch
group" construct, that maps the git repository branch for the majority of
repositories into a few broad groupings, such as "stable-qt4", "latest-qt4" 
and
"kf5-qt5". Developers and users using kdesrc-build could then use these groups
to easily build the appropriate git branch of the many repositories needed for
current releases of KDE.org software. This also allowed the CI infrastructure
to support testing the development branches of both software using both
KDE4 and KF5, in addition to the libraries/Frameworks themselves.

Current Issues
==============

Things have gone fairly well with branch groups, but there have been minor
issues with the construct:

1. The existing metadata listing dependencies between git repositories could
   not support multiple branch groups, as the dependencies were not 
necessarily
   identical for a given repository, for every possible branch group it
   belonged to. We worked around this by forking the metadata such that each
   different branch group used a separate dependency file.

2. Compounding that issue, different branch groups would have different sets
   of repositories. For instance some repositories will never have a KF5-based
   release due to ongoing reorganization, and many repositories were born for
   KF5. By common agreement, software using `kde-build-metadata` now recognize
   empty git branch names to mean that a repository doesn't actually belong to
   the given branch group. This is still a workaround, however; if we forget
   to manually specify an empty branch, then CI and kdesrc-build will both
   try to build that repository as part of that branch group (using a default
   branch).

Upcoming Problems
=================

A larger concern (and what instigated this effort) is that the KF5 era will
introduce multiple development models that are difficult for the CI
infrastructure to efficiently support.

For example, testing the KF5-based Plasma 5 Workspace will eventually need to
test both the stable and development tracks for Plasma 5. Under the branch
group concept, this would lead to branch groups "kf5-qt5" and "kf5-qt5-stable"
(or similar names).

However the KF5 repositories that Plasma 5 requires do not have a split 
between
stable and devel: They use a review-required process by which there's only one
development track. In other words, Plasma 5's two development tracks will only
depend on 1 KF5 track.

At this time, that means CI will have to build 56 KF5 modules to test Plasme
5-stable, and then re-build, re-install, etc. the exact same 56 modules to 
then
test Plasma 5-devel. This re-build is required because experience has shown
that built repositories cannot be assumed to be compatible between different
branch groups (in fact many repositories are significantly different on-disk
between branch groups). There's simply no data recorded at this point that
delimits the ways in which repositories would remain compatible (or not)
between different branch group combinations.

Solving this (so that the right 56 modules are retained and re-used) would
require quite some manual hackery, and it's uncertain how easy these hacks are
to implement within Jenkins and the CI infrastructure in the first place.

Overview of Proposed Fix
========================

What we would like to do instead is the classic Comp. Sci. fix: Another layer
of indirection.

In this case, we'd like to re-organize the `kde-build-metadata` to map to the
same types of project divisions that we already intuitively utilize ourselves
(i.e.  the repositories that make up Plasma 5 are a different grouping than
those that make up KDE Frameworks 5, which are different from those that make
up KDevelop for KF5, etc.).

Under this scheme, the universe of all (KDE.org) git repositories would fall
into this outline:

    + Division (e.g. KF5)
     + Track   (e.g. "devel")
      + Repositories + Git branches

The following would be true of these divisions:

* Each division/track combination could depend on a different division (e.g.
  Plasma5/Devel could depend on KF5/Devel).

* Each division/track combination would list all git repositories that make up
  that track (wildcards will continue to be permitted), along with the git
  branch of that repository. E.g. Plasma5/Devel could include
  "kde/workspace/plasma-workspace: master", while Plasma5/Stable might include
  "kde/workspace/plasma-workspace: Plasma/5.0".

* The "branch group" concept will be retained (both for backwards compat for
  kdesrc-build users and for ease of Jenkins implementation), and is the "most
  global" grouping (but now, of divisions, not repositories directly).  Each
  division will map global branch group names to one of its tracks, if
  appropriate.

  So "kf5-qt5" might mean "KF5/Devel, Plasma5/Devel, etc." while
  "kf5-qt5-stable" might mean "KF5/Devel, Plasma5/Stable, etc.". If CI builds
  "kf5-qt5-stable" and then builds "kf5-qt5", it would be able to skip 
building
  "KF5/Devel" the second time as it's stated to be compatible with both 
Plasma5
  tracks.

* Any given repository in a branch group would map to 0-1 divisions. 0, since 
a
  repository simply might not be present at all (and might even be in 
different
  divisions for different global branch groups...). 1, since there must be 
only
  1 possible git branch name for a repository.

* Instead of using a separate dependency file, intra-division dependencies
  would be listed along with the rest of the division/track details.

* Likewise, inter-division dependencies would be supported (but the dependency
  would only be on the repository names, since the branches for that 
repository
  would be controlled by the division/track combination). This is to allow for
  smaller applications that depend on only a couple of Tier 1 KF5 repositories
  to be tested without building all 50+ KF5 modules too.

* You can also simply depend on a division/track combo as a whole, without
  listing each individual dependency (similar to how many apps now depend on
  the virtual "kf5umbrella" repository).

* A division can specify that certain of its tracks are equivalent. For
  instance, FooApp/stable might only require Plasma5/stable, but work 
perfectly
  fine with Plasma5/devel if it's already available, which is something 
Plasma5
  can specify.  This helps reduce combinatorial explosion for the CI
  infrastructure.

* Every repository would need to be a member of *some* Division/Track
  combination to be seen by CI, even small apps.

Detailed Outline
================

The JSON file already in use in the current specification would be modified to
have (besides the boilerplate), a structure of the following form to hold the
required data:

    "divisions": {
      "KF5": { ... },
      "Plasma": {
        "branch_group_tracks": {
          "kf5-qt5": "devel",
          "kf5-qt5-stable": "stable"
        },
        "divisions_needed": {
          "devel": {
            "Qt5": "devel",
            "Milou": "devel",
            "KF5": "devel"
          },
          "stable": {
            "Qt5": "stable",
            "Milou": "stable",
            "KF5": "devel"
          }
        },
        "repositories": {
          "kde/workspace/*": {
            "devel": "master",
            "stable": "Plasma/5.0"
          },
          "kde/workspace/oxygen": {
            "devel": "master"
            !! Wouldn't be included with the "stable" track at all!
          }
          *All* other modules would be listed here for this group
        },
        "excluded_repositories": [
          "kde/workspace/plasma-nm"  (maybe this goes with a separate 
division)
        ],
        "dependencies": {
          "*": {  <-- would apply to all tracks
            "divisions": [ <-- could be used to depend on entire divisions
              "KF5"
            ],
            "repositories": {
              "kde/workspace/*": "extragear/base/milou",
              "kde/workspace/plasma-workspace": "kde/workspace/libkscreen",
              more common deps go here...
            }
          },
          ( individual tracks could have added dependencies on repos or even
            whole divisions )
          "devel": {
            "repositories": {
              "kde/workspace/*": "project/only/devel/depends/on",
            }
          }
        }
      },
      "Qt4": { ... },
      etc.
    }

Some notes:

* The `branch_group_tracks` section is where the global branch-group concept
  (latest-qt4, kf5-qt5-stable, etc.) would be mapped to the appropriate track
  for this division. This is perhaps most useful for CI, though kdesrc-build
  could still utilize it for those who manually list modules to build.

* The `divisions_needed` section would list division/track pairs needed for
  each track in this division. This is not a dependency *per se*, it simply
  indicates to the CI infrastructure that repos from already-built
  division/tracks would not need to be rebuilt if they match the 
division/track
  requirements contained in this section. However any dependencies for
  repositories in this division must be to divisions contained in this 
section,
  so that it's possible to determine the appropriate branch to build.

* The `repositories` section would list every single git repository that is
  part of this division/track, using the project path to name the 
repositories,
  and allowing wildcards as the existing metadata does. You'd have to be
  careful with wildcards not to accidentally include a repository from a
  different division (we anticipate validation tooling to help with this).

  You'll also note that it's possible for different tracks to have different
  lists of repositories (it's even possible for a given repo to belong to
  different divisions, which is allowable as long as the graph of
  divisions/tracks for the whole branch group has that repo in no more than 1
  division.

* The `excluded_repositories` section is optional, and would be used in
  situations where it's easier to use wildcards to include too many
  repositories into the division/track, and then filter out the repositories
  that should not be part of the division. It might be easier just to spell 
out
  each repository however...

* The `dependencies` section is pretty much what it says on the tin, and
  strengthens the "compatibility non-interference" and ordering properties of
  divisions into actual dependencies, and also allows for repository to
  repository dependencies to be expressed for the CI (this would replace the
  dependency-data-foo files in kde-build-metadata).

* The objects under `dependencies` are mappings of tracks to the dependency
  information itself. The `*` track would be used for dependencies common to
  every track of that division.

* `dependencies/$track/divisions` is to allow entire divisions to be declared 
a
  dependency (the track is not specified, since it's already required to be
  noted in `divisions_needed`), and is optional.

* The `dependencies/$track/repositories` section on the other hand, should
  always be present, at least to specify intra-division dependencies as needed
  by both CI and kdesrc-build. These dependencies are between *repositories*,
  not divisions, and don't include any branch information (since branches are
  now entirely determined by which division/track combination contains a
  repository).

  Repository dependencies can cross division boundaries (which is why every
  repo is required to be part of some division/track combination).
  Cross-division dependencies would still require an entry in
  `divisions_needed` (Milou, in this case) to figure out which track to use.

Next Steps
==========

Porting to the proposed new system would require code changes in both
build.kde.org and kdesrc-build, testing, and setup of the required metadata in
`kde-build-metadata`, with the wider community to be kept informed as progress
is made.

The hope with all of this is to manage the complexity that arises from the
interdependencies of git repository+branch combinations, in a way that allows
us to maintain the value of using our CI testing infrastructure without
needlessly recompiling and reinstalling software that should be compatible, 
and
to do all of this in a way that aligns with our intuitive understanding of how
we now organize our projects.

We await your comments, suggestions, clarification requests, and other
feedback.

Reply via email to