On Mon, 27 Apr 2009 11:23:32 -0400, John Nielsen <li...@jnielsen.net> wrote:
> On Saturday 25 April 2009 09:12:50 pm Giorgos Keramidas wrote:
>> On Fri, 24 Apr 2009 05:35:34 -0400, John Nielsen <li...@jnielsen.net>
> wrote:
>> > I'm working on a machine learning project and I'd like to use the
>> > FreeBSD src CVS commit history as a datasource. Is there a
>> > resource-friendly way for me to download some or all of it? Format
>> > isn't too big an issue.
>> >
>> > I tried a few "cvs history" commands against the anoncvs servers but
>> > get this: cvs [history aborted]: cannot open history file:
>> > /home/ncvs/CVSROOT/history: No such file or directory
>> Do you really want just the `CVSROOT/history' file?  We allow mirroring
>> of the entire repository, which you can then use to extract any sort of
>> historical commit data.  (Well, _almost_ anything.  Some things like
>> repo-copies and renames of raw repository files have been done without
>> any sort of record, so it may be impossible to recover *those*
>> particular bits.)
> I'm basically looking for a list of all commits over the past N (>2)
> years with committer, timestamp, affected file(s) and/or subsystems
> and possibly diff size information, etc. I don't know anything about
> the "history" file in particular other than that's what cvs complained
> about when I tried the "cvs history" commands against anoncvs. It
> looks like the /pub/FreeBSD/development/FreeBSD-CVS/src ftp path may
> have what I'm looking for (though it may be scattered through the
> individual files). I'll probably (try to) set up a local CVS repo and
> source it from there and see where that gets me. My CVS-fu is weak so
> I'm still open to pointers.

There are online instructions for mirroring a full CVS copy, so it
should be relatively easy to do that.  It mostly boils down to setting
up the necessary disk space somewhere locally, installing one of the
CVSup ports and configuring a `supfile' like this:

    *default host=CHANGE_THIS.freebsd.org
    *default base=/path/to/local/cvs/mirror
    *default prefix=/path/to/local/cvs/mirror
    *default release=cvs
    *default delete use-rel-suffix
    *default compress


Yo should change `CHANGE_THIS' with the hostname of a CVSup mirror (a
full list can be found in the Handbook), and then point the local CVS
mirror directory from `/path/to/local/cvs/mirror' to the place you will
keep the mirror.

To pull over the CVS mirror files, you can then run:

    # cvsup -g -L 2 supfile

Note that this will take quite some time if you are starting from an
empty mirror, and it may be a good idea to rerun cvsup 1-2 times after
it's done, to make sure you have the latest changes -- including any
changes that were committed between the time you started mirroring and
the time the first run was done.

FYI, my local copy of the repository uses around 4 GB today, so you
should plan to keep the mirror on a disk with at least this amount of
space (a few extra GB won't hurt either):

    # du -sh /home/ncvs
    4.0G    /home/ncvs

>> We also have a Subversion repository now, that you can use to grab
>> commit information.  It takes slightly more disk space than the CVS
>> repository, but subversion can export XML formatted commit logs, which
>> may be slightly more useful if you plan to automate parts of the
>> parsing and info-gathering.
> Yes, I'll definitely be automating the parsing, etc. Is it safe to
> assume that the cvs2svn migration went successfully? XML logs do sound
> appealing and aggregated (same time, multiple files) commits would be
> more useful than per-file. Can I just check everything out from
> svn://svn.freebsd.org/base/?

The conversion from CVS to Subversion was ``good enough'' from what I
see in the svn commit logs.  So it may be a good idea to use `svnsync'
to mirror the /base/ repository locally and take it from there.

The instructions for mirroring the Subversion repository are a bit more
involved, but if you decide to go that way, let me know and I will write
a short description of how to do it.

Attachment: pgpray5r6lHUa.pgp
Description: PGP signature

Reply via email to