Johan Herland <jo...@herland.net>:
> > Alan and I are going to take a good hard whack at modifying cvs-fast-export
> > to make this work. Because there really aren't any feasible alternatives.
> > The analysis code in cvsps was never good enough. cvs2git, being written
> > in Python, would hit the core limit faster than anything written in C.
> Depends on how it organizes its data structures. Have you actually
> tried running cvs2git on it? I'm not saying you are wrong, but I had
> similar problems with my custom converter (also written in Python),
> and solved them by adding multiple passes/phases instead of trying to
> do too much work in fewer passes. In the end I ended up storing the
> largest inter-phase data structures outside of Python (sqlite in my
> case) to save memory. Obviously it cost a lot in runtime, but it meant
> that I could actually chew through our largest CVS modules without
> running out of memory.
You make a good point. cvs2git is descended from cvs2svn, which has
such a multipass organization - it will only have to avoid memory
limits per pass. Alan and I will try that as a fallback if
cvs-fast-import continues to choke.
> > It is certainly the case that a sufficiently large CVS repo will break
> > anything, like a star with a mass over the Chandrasekhar limit becoming a
> > black hole :-)
> :) True, although it's not the sheer size of the files themselves that
> is the actual problem. Most of those bytes are (deltified) file data,
> which you can pretty much stream through and convert to a
> corresponding fast-export stream of blob objects. The code for that
> should be fairly straightforward (and should also be eminently
> parallelizable, given enough cores and available I/O), resulting in a
> table mapping CVS file:revision pairs to corresponding Git blob SHA1s,
> and an accompanying (set of) packfile(s) holding said blobs.
Allowing for the fact that cvs-fast-export isn't git and doesn't use
SHA1s or packfiles, this is in fact how a large portion of
cvs-fast-export works. The blob files get created during the walk
through the master file list, before actual topo analysis is done.
> The hard part comes when trying to correlate the metadata for all the
> per-file revisions, and distill that into a consistent sequence/DAG of
> changesets/commits across the entire CVS repo. And then, of course,
> trying to fit all the branches and tags into that DAG of commits is
> what really drives you mad... ;-)
Well I know this...:-)
> > The question is how common such supermassive cases are. My own guess is that
> > the *BSD repos and a handful of the oldest GNU projects are pretty much the
> > whole set; everybody else converted to Subversion within the last decade.
> You may be right. At least for the open-source cases. I suspect
> there's still a considerable number of huge CVS repos within
> companies' walls...
If people with money want to hire me to slay those beasts, I'm available.
I'm not proud, I'll use cvs2git if I have to.
> > I find the very idea of writing anything that encourages
> > non-history-correct conversions disturbing and want no part of it.
> > Which matters, because right now the set of people working on CVS lifters
> > begins with me and ends with Michael Rafferty (cvs2git),
Yup, I thinkoed.
> > who seems even
> > less interested in incremental conversion than I am. Unless somebody
> > comes out of nowhere and wants to own that problem, it's not going
> > to get solved.
> Agreed. It would be nice to have something to point to for people that
> want something similar to git-svn for CVS, but without a motivated
> owner, it won't happen.
I think the fact that it hasn't happened already is a good clue that
it's not going to. Given the decline curve of CVS usage, writing
git-cvs might have looked like a decent investment of time once,
but that era probably ended five to eight years ago.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html