At 2007-03-08T18:15:58+1300, John Carter wrote:

> 1) Our CVS repository for one of our projects is 1gb in size, although
>    the app is about 55Mb in size. ie. Lots and lots of history and tags
>    and...
> 
>    Usually every developer has several checkouts on their box at one
>    time as the work and collaborate on different things. ie. 55mb * 10
>    maybe.
> 
>    So does this mean each developer would need 1gb of disk space for
>    the repository + 55Mb per checkout (not too bad). Or does this mean
>    that each check out of a version onto a local drive includes all
>    versions ie. 1Gb per checkout (which would be awkward)?

The former.

You should find that a monotone repository containing that development
history is significantly smaller than the CVS repository because we use
binary deltas and zlib for storage.

This assumes the history was created in monotone or imported using a perfect
conversion tool.

This might not be true if you just took the CVS repo and imported it with
the cvs_import command on mainline--it doesn't reconstruct branches, so you
can get quite a bit of duplication in the converted history.  I don't
remember how it fares, I think we come out about equal after a conversion.
There's a bunch of work going on on a side branch to improve our CVS
importer, so this claim should also be true for imported CVS repos in the
future.

For an idea of the compact history storage, I've got a project in monotone
where the source is ~390MB when checked out.  The repository has ~1200
revisions and 10 branches, and is ~250MB on disk.  It could probably be a
fair bit smaller, but half of the branches aren't attached in history
because I was lazy and in a hurry when I imported it manually from
Subversion.

I like to point out that the same source checked out of Subversion directly
measures closer to ~900MB (and 4x the number of files) because of the
pristine file store and other bits of magic stored in the .svn directories.
You don't even get a local copy of history with this, just a way to make
'diff' and 'revert' work without hitting the network...

As another example, monotone itself is 23MB checked out.  A pull of
everything the server running on venge.net (our shared server) results in a
130MB repository containing ~10,000 revisions and 193 branches.  Not all of
the 130MB of history is monotone, we also host a few related projects such
as GuiTone and ViewMTN.

> 2) By "sync" repositories do you mean
>  * "this particular branch tip" is the same as "that particular revision" 
> or do you mean
>  * "all revisions ever, no matter how embarrassing, that ever were in
>    my repository are now in yours and yours in mine"
> or what precisely?

The latter.  This is actually really useful and important history, there's
no point throwing it away!  Your mistakes and experiments serve as useful
documentation for the next person who comes along to hack on your code.

People seem to be quite shy about this at first, and ask for some way to get
the former... they almost always change their minds once they get used to
the idea.

And, if you really want to go to the effort to hide the development you did
to get from the revision you based your work on to the revision you want to
publish, you can do that by generating a patch and publishing that instead.
This might include some variant where you have a private and public branch,
with the public branch only containing revisions generated from roll-up
patches from the private branch.

> >Coding style:
> >- don't use pointers
> >- don't use heap allocation
> 
> the sort of rules I would expect in an embedded system, curious to
> find it here...

Well, you really want a VCS to be robust.  These rules eliminate entire
classes of very common bugs.

> Hmm. Let me look at this HACKING file, browse source. AARGH! Scary!
>   http://viewmtn.angrygoats.net/
> which is the One True Source?

Heh, yeah, that installations of ViewMTN happens to suck in branches from
any project the maintainer of ViewMTN is interested in or of any project who
asks him nicely enough.  It is a bit overwhelming for new users--I've
updated the ViewMTN link on the website to point to our main branch
directly.

The main branch for monotone is net.venge.monotone.  Pretty much everything
else under net.venge.monotone.* is a branch of the monotone source, or
strongly related work.

> Hmm. I think I have found it...
>  
> http://viewmtn.angrygoats.net/revision/browse/f93b47fe55221c5ce51cc01e522ec0b92df49a2b

That's the head of the 0.33 release branch.

> Hmm. Ok, do you mean just use automatic stack variable and auto_ptrs?

Sure, place your data on the stack, and if it's a large data structure, pass
it by (const) reference.

There are a very few exceptions where heap allocation and pointers are the
best way to implement something--in these places we think carefully and hard
to be sure this is the right approach, then use boost::shared_ptr (similar
to std::auto_ptr) or a variant to manage the resource's lifetime.

There are a bunch of things we use that perform heap allocation behind our
backs (e.g. lots of the STL), but we don't need to worry about dealing with
pointers, lifetime management, etc. with those.

> You say you did this as a Summer of Code thing? Are you still a
> student? What year doing what?

The project has participated in Summer of Code both of the previous times it
has been run, and we'll be signing up again for 2007.  I'm not a student--my
interaction with SoC has been as a mentor for the monotone project, and a
subsequent post-SoC mentor summit at Google.

Cheers,
-mjg
-- 
Matthew Gregan                     |/
                                  /|                    [EMAIL PROTECTED]

Reply via email to