Resending this, hopefully now without bounces...

On Thu, Dec 20, 2007 at 06:03:33PM -0800, Tom Gal wrote:

The answer is simply that git/mercurial/bzr/darcs store much more
metadata than CVS/SVN.

Therefore, mercurial/darcs/bzr/darcs are, by definition, no worse.  And,
yet, they have the possibility of being better.

Sorry. You meant, better functionality wise, Worse storage, time and network
bandwidth wise, right?

Plus all of these perform differently on different kinds of data sets.  The
extra metadata isn't really what takes up the space, though.

Most revision control systems have stored file deltas.  Some, such as darcs
and arch take this to the extreme, where the patch is the actual object in
the repository.  With a delta-based store, accessing old versions requires
that the patches be at least partially applied.  Usually the 'annotate'
operations stresses this the most, but it can also be slow to just get an
old version.

Git started out at the other extreme, where it began by storing the
complete contents of every version of every file.  Delta compression was
implemented later, and it is even possible to ask git to re-deltify an
existing repository, without changing the resulting data.  My experience as
well as others reports that this lazy delta compression actually ends up
using significantly less space than a more traditional system.

The other big difference in storage is that all of the mentioned ones
(git, mercurial, bzr and darcs) store revisions of entire trees and not on
an individual file basis.  This means that retrieving history restricted to
a file requires trudging through the entire branch's history and filtering
out that file.  It does allow some useful things, such as asking for the
history of a group of files or directories.

Most of the decisions that went into something like CVS, or even Perforce,
were made when filtering all revisions to find a given file, or
reconstructing all of the deltas to give one output would have been
prohibitively costly.  This new generation of revision control is largely
even possible because of the drastic increase in processor and disk
performance.

As far as network bandwith, I've found git to be quite good.  It uses the
same delta compression method based on the data that it determines is
shared by both sides.  The deltas are not just based on the previous
version of the file, but many past versions, along with data from other
files determined by heuristics.

Dave


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to