summary: background, motivation, and plan for a tree-structured development 
process is presented. My question is (roughly), should the "code bucket" to 
which code is committed at each level of the process be implemented by a 
separate repository, a separate branch on a single repository, something else, 
or does it matter?


Apologies for the length of this post, but it seems there's a lot to explain. 
In reverse order, I have a question about a plan, for which I present the 
motivation and some background: feel free to skip over parts, but I suspect it 
"all ties together."


I recently began to work with a group that recently had an embarrassingly 
extended release. The short story is, we kept throwing what we thought was good 
code over the wall to the beta testers, who kept throwing it back--for 5 
months. The long story/etiology includes:

1 Our people are software engineers only by default. They're really scientists 
who code, and who have learned a bit about software engineering "just by 
doing." But their focus has been on what they do with their code, not their 
tools or development process (until now), which can seem pretty crude (at 
least, to a coder who's starting to learn the science, like me).

2 We have a very centralized dev process. There is one CVS repo to which 
everyone commits. (Technically, there are several: since they don't know how to 
branch or create read-only users, they just clone the filesystem everytime they 
want to freeze something. But for commits there is only one repo.) Everyone 
commits to HEAD, for the reasons why most CVS users don't branch. Theoretically 
everyone runs a big bucket o' tests before committing; in practice, there's a 
small group (2 guys) who manage releases and actually/reliably test.

3 We have a very long release cycle: several years, for which there are 
apparently some legitimate reasons. But we don't do intermediate integrations, 
or manage dependencies; ISTM, that's just slack, and means that pre-release 
testing follows a painful pre-release integration of code from our many 

Related to this etiology are the following continuing constraints:

4 Resource: our funding is flat, and our group's headcount is actually 
declining (retirees are not being replaced). We are supplied with contractors 
who service our clusters (more below), but no other computing support (other 
than desktop support for "productivity apps" like Lotus Notes). We need more 
contributions from our community of users (which I suspect many could/would 
give), but, for legal reasons (not related to licensing--the code is 
open-source), it's hard for us to enable access to code that has not been 
"fully reviewed." (More on excessive security below.) These are longer-term 
problems :-(

5 Automated testing of large-scale scientific models seems inherently hard. (If 
there's anyone out there working on this problem, please ping me offline--I'd 
like to learn more.) There are ways to attack this in which I'm definitely 
interested, but that's also a longer-term problem.

6 We are not mobile developers. We run and test our code on a couple of 
clusters which are behind some exceedingly strict firewalls--so strict that few 
folks have the ability to VPN (aggravated by the resource constraint), and it's 
painful for those that do. We can't ssh or https out of the cluster, which 
complicates sharing of code (via, e.g., github) and data. Hence folks work on 
code almost entirely from their desks (which are on LANs that have cluster 
access) and not from home or on the road. This is also not likely to change 
anytime soon.


My group intensively uses our tool for our scientific work (we majorly "eat our 
own dogfood"), but we also have a significant external community of users. The 
5-month delay of an announced release was therefore rather embarrassing, and we 
also realize that it wasted lots of time/effort. Now that we're planning for 
the next release, I'm proposing some process upgrades to address those 
problems. Some proposals are no-brainers, or at least are off-topic for this 

* CVS -> git: the following plan presupposes we do this. This is not quite a 
no-brainer, since we'll hafta train folks how to use git, but I can't see 
disadvantages to migration that aren't outweighed by the advantages of git.

* dependency management (a bit more on dependencies below)

* shorter release cycle || intermediate integration builds using specified 
dependencies (and that's a boolean 'or')

(If you've got reasons why not to do those, please post me separately, and not 
on this thread/Subject.)


My final proposal is more complex. I'd appreciate comments on it, particularly 
regarding an implementation detail discussed below. This implementation detail 
reflects the similarity and differences between git repositories (or remotes) 
and their branches. Since in git the difference to the user between {pushing 
code to, pulling code from} any particular branch on any particular repository 
can be made fairly transparent (am I missing something?), I'll just use the 
term "code bucket" to refer to something from/to which one can pull/push.

For better testing and evaluation, I'm proposing that we move from a 
centralized process/repository to a tree structure. The release managers (who 
have other jobs--they do this "on the side") are empirically overwhelmed, so 
ISTM we need better "division of labor," i.e., distribution of test and 
integration effort. Furthermore, we already have workgroups which discuss and 
prioritize big function chunks (e.g., chemistry, meteorology, land cover), and 
project groups working on smaller ones (e.g., aerosol nucleation), "in between" 
the individual scientist/coder and the top-level management/repository. (Note 
that everyone belongs to more than one workgroup and project team: software is 
modular, but nature is not.) So I'm trying to leverage those groups to get the 
necessary integration/test work done, and give the release managers "fewer 
throats to choke." The proposal is, bottom up:

1 Each coder gets her/his own bucket, for her/his own code, on which s/he tests 
as s/he will. The main difference between that and the status quo (besides cvs 
-> git) is, s/he will be required to publicly declare (on our group's wiki) 
what test(s) s/he runs.

2 Each project (i.e., one or a few function points we want to add or modify) 
gets assigned to a project team (PT). Each PT

* has a declared lead, who is responsible for that project, and represents the 
PT at workgroup meetings.

* must declare what test(s) it runs on its code.

* has its own separate code bucket. When a member coder wants to "commit," s/he 
requests pull from her/his PT lead, who pulls/merges/tests. The PT evaluates 
the results; if satisfactory, the PT lead commits to that PT's bucket.

3 Each workgroup (WG) is like a super-PT: a WG integrates the code from its 
member PTs in the way that each PT integrates its team members. A WG

* has a declared lead, who is responsible for its set of function, and 
represents the WG when meeting with the release managers.

* must declare what test(s) it runs on its code.

* has its own separate code bucket. When a member PT wants to "commit," it 
requests pull from its WG lead, who pulls/merges/tests. The WG evaluates the 
results; if satisfactory, the WG lead commits.

4 The release managers (RMs) integrate the code from the workgroups. The RMs 
collectively determine, for a given release or integration build (IB),

* dates

* what its dependencies will be (i.e., on what versions of (e.g.) libraries and 
compilers that release or IB must run)
* what function goes in (the determination and arbitration of which seems 
consume lotsa work)

  The RMs also

* must declare what test(s) it runs on the release or IB

* manage the top-level (separate) code bucket. When a WG wants to "commit," it 
requests pull from an RM, who pulls/merges/tests. The RMs evaluates the result; 
if satisfactory, the RMs commit.


My general questions are, does the plan above seem

* feasible, given our constraints?

* solvent: does it seem likely to solve the problems described above? (notably, 
that the centralization of our process is overwhelming the folks at the center)

My specific question regards the implementation of the "code bucket" at each of 
the levels above: should it be implemented by

* a separate repository

* a separate branch on a shared repository

* Something Completely Different

? I'm leaning toward separate repositories, but am wondering if there are 
performance or operational details of which I'm unaware, given the following 
constraints. To be more specific, the implementation I currently favor is, for 
each level:

1 Each coder gets a separate git repository on her/his desktop, which is on a 
LAN that can ssh (and therefore run protocol=git) and https into the cluster. 
Unfortunately these are mostly windows (XP), but I'm presuming git runs well 
enough on that--am I missing something? (I run debian, and am mostly blissfully 
ignorant of platforms != linux.) Coders would also be free to create 
repositories on the /home filesystems on our clusters (which run RHEL 5, but 
may soon be moving to CentOS 6). On their repositories, a coder would be free 
to create branches and tags as desired.

2 Each project team lead gets a separate repository on one or both of the 
clusters. (We can ssh/git between the clusters, and between the clusters and 
the desktop LAN, but can neither ssh/git nor https from either clusters to the 
outside world.) PT leads are also free to branch and tag at will on their repo.

3 Each workgroup lead gets a separate repository on one or both of the 
clusters. WG leads are also free to branch and tag at will on their repo.

4 The release managers would maintain a separate repository on one or both of 
the clusters. Branch=master would, at any given time, hold the latest release 
or integration build. Immediately before a release or integration is declared 
(only following its successful testing!), the current contents of branch=master 
would be branched with the date of the integration, or the release number; then 
the contents of the current release/IB would be committed to branch=master. RMs 
may also create other branches or tags to facilitate integration and release.

your review is appreciated, Tom Roche <>

You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to