Re: I want a pony: Distributed RCS

Malcolm Tredinnick Wed, 10 Sep 2008 18:41:31 -0700


On Wed, 2008-09-10 at 18:58 -0600, Jeff Anderson wrote:
> Malcolm Tredinnick wrote:
[...]
> > You don't even begin to approach why this might be a good idea for
> > Django. So, what does it gain?
> >
> > Right now, you can already use your distributed VCS of choice with
> > Django and subversion. Some of us have been doing that for literally
> > years. The only time I ever use "svn" is on the very rare times I want
> > to alter subversion metadata properties. However, subversion is a very
> > good lowest common denominator for everybody to use as the central
> > repository and it makes a lot of sense to continue to have a central
> > repo.
> >   
> The problem with going with the lowest common denominator is just that--
> the lowest common denominator also means less features.


*shrug* That doesn't mean it's less features than we need.

>  In this case, it
> means I'm stuck with subversion's linear development. non-linear
> development is a requirement for the distributed model of software
> development.

At some level. It still linearises eventually, since changelogs are an
ordered file of changes and only one thing at a time lands in the final
release block of code.

> If I want to start a branch in my own repo, I can do that. The problem
> happens when merge conflicts start happening. I'm forced to do things
> "the subversion way" when I'm stuck with a subversion backend. I
> **must** rebase my work rather than merge.

You can work out the merge conflicts and fix them up that way.

>  This isn't really a good
> thing in the distributed environment. It also breaks my ability to
> directly check in things from my branches and repos-- when people are
> constantly rebasing their work, I lose any ability to really track their
> branches, and almost all advantages of using a distributed RCS.

Keep in mind that the work tracking from the central repository is only
one component of development work. As will come up again below, far more
work actually goes into preparing a final feature than the code change
that eventually lands. Having to linearise on your side for one branch,
rather than having it automatically done by the tool is a concession.
But it's a useful concession since it enables a much larger audience to
also participate. Using distributed tools and understanding how to use
them well is hard. You've apparently done a bit of research and use
here. I know I have, too. And, yet, we have some different opinions
about workflow and capabilities. And we're two people. Now multiply that
by 10,000. Factor in those who haven't used any version control system
before. Subversion itself is tricky enough. Low barrier to entry and
contribution is a requirement. Those of us wanting to use a more
distributed model can do so (and are doing so), but some accommodations
of the others is necessary.

Short version: there are some trade-offs. They're all possible to work
around if you choose to. The advantages usually outweigh the trade-offs
for those of us wanting to use that model and for those that are more
comfortable of doing things other ways, we aren't forcing them away.

> Continuing to have a single, central repo isn't exactly moving to the
> distributed model of development. I didn't realize that I needed to
> explain the differences between the central repository model and the
> distributed model, but I'll try.

Yeah, thanks. I was wondering where I'd left those instructions about
how to suck eggs. :-)

Yes, I'm joking. Maybe you thought I was clueless, so you tried to fill
in the blanks. Fair enough. That's being helpful.

Seriously, I've been using distributed version control systems for a few
years now. I track a number of projects that use them. I use them
personally for both open source and client work a lot. Some are truly
decentralised, others are merely distributed with a more obvious central
node a la Django. All are distributed.

You're still talking about how this affects your workflow, not about why
it's better for Django (you listed a bunch of possibilities, but not how
they're advantageous to anything beyond the fact that it will mostly
eliminate the periodic merge conflict; and they won't be that common).

You *can* still work on branches and exchange with other people using
distributed systems. You'll have to have a branch that tracks Django and
periodically merges from that to your particular published development
branches. That's fine. Commit ids are stable in, for example, git-svn,
so merges will be the same for everybody who merges from a
subversion-tracking branch to their development branch (in the sense
that everybody pulling from subversion will get the same commit id for
the same upstream commit; it just won't necessarily match the one they
were using on their development branch if it wasn't pulled from
subversion. It's the standard rebase issue). I would like to think that
other DVCS do things similarly stably. Yes, there are a few little
oddities with merging things you already had that were then passed
upstream and come back as a merge with a different commit, but that's
relatively minor in the grand scheme of things. Most development doesn't
actually result in a commit to djangoproject.com upstream, when you sit
down and think about it (there's more back-and-forth in the development
phase than in the final patch). Distributed systems allow creating
branches very easily, so after a big block of work that is accepted
upstream, it's not particularly hard to, for example, stop using the
branch you were developing on for that and use a different one for the
next feature you're working on. That's not abnormal practice even in
highly distributed projects like the linux kernel, since it keeps new
features isolated from each other as much as possible.

At that point, you can publish your branches and happily work back and
forth with anybody else using the same workflow.

What will still happen, though, is that the central version of Django,
the thing that is called "Django" and is released, is based off a
particular branch, which is the one synced from our central subversion
repository. This actually happens even in distributed development. When
something is released it is released from *somewhere*. There is a
particular commit on a particular branch in the entire universe of
versions of the code that is called the release. We choose for the
location of that to be in the subversion repository. This isn't contrary
to distributed development at all. It's saying ahead of time that there
is a "master" version that things feed up to (there's nothing about
distributed development that says a hierarchy of checked-out versions
isn't possible; it's just not a requirement or a non-requirement).

Built on that basis, the rest still comes down to workflow. At some
point, necessary changes have to filter back to the main place from
which the release will be done.

>  They are very different philosophies.
> I'm suggesting that Django considers this philosophy of developing
> things in a distributed fashion. I'm not suggesting that Django continue
> using a centralized repo model, and simply switch from svn to another
> tool. I'm sorry if that's what it sounded like.
> 
> A distributed model would mean abandoning this notion of committers and
> non-committers, and thus also the concept of a central repository.

That's not necessarily something that follows from the definition. It's
one way it can work, but it's only the *only* way if you choose a
restricted definition of a phrase that is new enough not to have a
canonically obvious meaning yet.

>  There
> are plenty of blog posts and documents about this approach to software
> development, their benefits, and weaknesses. I highly suggest doing
> research on this approach if you aren't terribly familiar with it.
> 
> One way that it *might* work for the Django is each component would have
> someone that "watches over" it. 

That won't really work for us, since we rely heavily on many eyes making
things work. Commits to the "final resting place" for things that will
ultimately make it into the release give us one checkpoint through which
everything passes. Anybody and everybody can watch that and review the
code. Many bugs are caught that way. Given the size of our developer and
contributor base, the abilities of both and the relatively small size of
our code, this is a pretty good model.

> Someone would be over the translations,
> someone would be over forms, brosner would probably be over the admin
> app, etc. Translations I believe is a good example. A translator for a
> particular language or locale would update their working copy and
> commit. Their changes would get merged into the translation manager's
> repo. Generally, a release maintainer would be the one that merges in
> stable/completed features into their git repo, so they'd merge in
> anything when the translation maintainer says he has more stuff ready.

You've just described a hierarchical system of merges that is the same
as what we have now. Everything filters up to the subversion repository.
You can still use whatever system you like down below and trade back and
forth between people using similar systems. The only concession to
having something that needs to rebased (the svn -> git conversion, say)
is that you don't do your development on the branch that gets updated
from upstream, but, rather, merge that into your branches.

Remember we're a relatively small project. There doesn't need to be more
than a single layer of "formal" hierarchy for merges going into the
thing targeted for release. And as each new layer is added, it really
does get harder and harder to track what's going on in places you're
interested in.

> This is very different from the way that things currently work. There
> wouldn't really need to be any formal decisions about "who is in" and
> "who is out" for commit access.

There is nothing at all stopping you from publishing your own repository
of Django changes. And pulling changes from whoever you want. So
everybody's already a committer on some level. Again, it's a difference
in workflow, not capabilities. Right now, the "commit bit" for the
central subversion repo just controls who can do the final push to what
we use as the basis for a release. It doesn't have any influence over
who can develop work, how they do so and who can ultimately propose them
for inclusion.

I'm personally far from convinced that the features you've outlined add
significant extra advantages or remove any of the larger problems we
have in our workflow to justify the retraining, community upset and
*much* higher barrier to entry that it would require. Don't think of
this as "either/or": you can still use DVCS for development of new stuff
and the only interaction with subversion is at the interface to the
final Django version.

Regards,
Malcolm



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: I want a pony: Distributed RCS

Reply via email to