[git-users] Re: Git enterprise setup on a large project

Daniel Mon, 04 Apr 2011 13:21:10 -0700

Based on attempts to move from a similar humongous tree of code in
SourceSafe to git here are my personal lessons: (this are my own
notes, so i did not type this just for you)

1. Divide and conquer. <- this pertains to the problem, not the code.
You problems are actually separate issues:
a) How do you control what gets deployed (following review by
designated person) is somewhat agnostic to your code organization once
you get to git. Leave changed to this steps for last.
b) What version control tool you use.
c) How you organize the source tree.
d) How do you move devs and processes to the new structure.

Here is what worked:
Train, Parallel, Optimize.

1. Before you leave CVS, make sure ALL affected personnel is
intimately familiar with git workflow specifically as it applies to
your production code manipulation by:
a) Do NOT switch to GIT. Use CVS and git concurrently by: Dev checks
pulls and pushes to/from centralized CVS. On their hard drive the
checked out folder structure is ONE and the same git repo (set ignore
in git on all the CVS support stuff). CVS should NOT know about GIT.
Trying to "import" all micro-level commits from GIT will muddy the
water, process. Just large, occasional, stable "diffs" go up CVS.
b) Cultivate the culture of "local micro-revisions and branching,
global clean commits" At this stage, code is pushed vertically to CVS
and laterally between devs by git "format patches" (in cases of
"shared" branches) and plain patches.
c) (At this stage) Don't break up the CVS structure into "submodules"
or do anything that removes the parallel between the two.
d) Don't try to make devs to push to "centralized" Git repo. Because
of constant rebasing and history rewriting will not work. If devs
don't do constant local rebasing, history rewriting, they are not
using GIT well. Lack of centralized GIT structure allows devs freedom
to learn GIT most efficiently.
c) CVS is still you main, one and central repo. In case of fire, this
is what you save and this is where you make all devs commit (albeit
less regularly).

Once you have a core of devs happy and familiar with "micro revisions
and branching" and start to beg for centralized, formalized GIT repo,
make sure that FIRST these "champions" of GIT work with others to get
them on "happy" "local micro-revisions" board.
Once everyone is ready undo my 1.d. rule ("no centralized "mainline"
Git repo") and make the devs maintain absolute parity (in code, not
commits) between (still main) CVS and ("convenience") GIT central,
"mainline" repos.

IMPORTANT NOTE: This is where your future Git repo starts. Start it
right by importing CVS history into clean Git repo and making that
"wild" public Git "mirror" of CVS. If devs mess up this "wild" public
Git repo, you can always toss it restart this again and again by
importing CVS history. However, if "wild" public Git repo has settled
already, it's unsightly to rebase git history on top of new CVS
import, as all hashes change.

By this time, you will see that CVS is committed to, maybe, once in
2-3 days. Most of the "group" projects exchange "format patches" for
"group" "mainline" branches in git. Most of the stupid commits will
stay in local repos and your CVS will be in manageable state of
"somewhat stable." You will have A LOT of parallel efficiency gains by
this time as semaphoring the code changes will not be a chore.

2. Change processes. Break down the chains.
At this "stable" point, start changing your control process. Work out
for yourself how you keep track of two - the only two - important
things: "what's in production right now" and "where can I see the
latest state of all dev's 'semi-stable' work"
Again, divide the problem into smaller issues.
I suggest to branch off "this is in production now" into a separate
public GIT repo shared between people trusting each other to deploy
production. It's the same 'tree' structure as the the one in (by now
"wild") CVS, mirroring dev Git repos. The flow of code goes ONLY ONE
WAY: wild CVS/Git > "this is in production now" Git. However, it
allows deployment team to forget about commit parity, importing, and
instead work on large, tested elsewhere patch levels.
Do NOT EVER change code directly in "this is in production now" repo,
even compile. ALL code changes to go CVS (percolate to Dev Git) and
cycle back to "this is in production" repo as gargantuan patches, or,
if deployment people are up to this, do a public "clone" of  of "wild"
Git repo and constantly do only pulls/merges at critical times, of
specific commit / branches in "wild" Git.
The reason you would not care about commit-level partity and tracking
for "this is in produciton right now" repo is that you will throw it
away once the "wild" Git repo becomes main and you move the "this is
in production now" branch there.

Once you gain procedural structure and control over the "this is where
most recent, semistable work is" and "this is in production right now"
Git workflow...

3. Finally, move to Git. Reapply the chains.
At any point before here, you can revert back to CVS. It has your
latest code, full history, just committed with larger, more stable
chunks. CVS is still the one, only, mandatory and main repo for
"latest" code. I'd say forget about "branches" in CVS by now. Just
pile up milestones in there from "wild" Git. If you still think Git is
worthy, read this:

http://nvie.com/posts/a-successful-git-branching-model/

This is one of the best SHORT write ups on organization in Git, but
the guy is wrong in 2 places:
- Wherever he says "develop branch" you read "master branch"
- Wherever he says "master branch" you read "'this is in production'
branch"
Short for this two mental lapses on his part, the writer is right on
the money.

This is your time to shine as a leader. You need to press structure
similar to that one in the article into "wild" Git repo. Make the
"deployment" people involved and ask them to help maintain "this is in
production right now" branch in "wild" git. They can still ship
patches to their sheltered baby from which they actually deploy, but
they must maintain the branch in "wild" repo, cycling the commits per
mentioned article. Once "sort of done" (cause you are never done)
pressing the structure into "wild" make "wild" public Git repo main,
one, only, mandatory repo. CVS is now, optional, backup, whatever....

4. Move "deployment" team into "wild" public Git repo.
Instead of moving patches to "this is in produciton right now" repo,
make them do it off "wild" public Git repo's "this is in production
right now" branch.

5. Maybe, just maybe, you will reorganize the tree.
This is where you contemplate submodules.
Submodules are separate repos. Instead of "one repo, multiple
branches", you will multiple repos, multiple branches, dependency
rules, build system alterations to accommodate that, a special repo
management system (selfish plug: https://github.com/dvdotsenko/ges ).
Postpone it.

People think "my libraries are changed separately" and "i can maintain
my top-level apps seprately from libs", hence "i am ready for Git
submodules" That first part is often wrong. Postpone all contemplation
of submodules until later as that is a pain of a complication in
process. Moving to Git alone is good enough.

Daniel.

On Apr 2, 12:55 pm, FredJ <frederic.jec...@gmail.com> wrote:
> Hi,
>
> I'm looking to replace my team's old and clumsy CVS server, as a GIT
> user for personal projects I'm wondering what would be the best way to
> achieve this using GIT. I've googled the subject a few times and read
> books but I still have a handfull of questions
>
> Our team is composed of 10 developers.
> A common developer workspace takes about 1Go of space and contains
> about 50 modules (up to 100 if the developer is working on satellite
> applications/modules) - we're maintaining a large java-based
> healthcare system.
> Each module is a CVS project.
> Our use of CVS is.. well..clumsy too (don't ask me why we do things
> like this, those were decided about 10 years ago before I started
> working in this company).
> Features to be delivered in the next release are committed directly on
> the head.
> The project leader cherry picks the commits on his local working copy
> within a dedicated workspace.
>
> I first thought of creating a single project in order to group the
> 100+ modules and to ease the creation of maintenance branches.
> Each developer would clone the full workspace from a "blessed"
> repository.
> The "blessed" repository would be managed by the project leader which
> would simply pull changes from each developer pubic repository.
> I tested this scenario but, as the workspace is really big, each
> operation is really slow.
>
> What could I do then ?
> - Use submodules ?
> - Create a blessed per-project repository and instead of a public
> repository for each developer create a per-project public and shared
> repository ?
> ...I'm a bit lost
>
> Thanks for your help
>
> Fred

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To post to this group, send email to git-users@googlegroups.com.
To unsubscribe from this group, send email to 
git-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.

[git-users] Re: Git enterprise setup on a large project

Reply via email to