[gentoo-dev] CVS -> git, list of where non-infra folk can contribute

Brian Harring Mon, 01 Oct 2012 21:15:52 -0700

Cross-posting to scm; responses should go to scm please (and the 
people who whinge about cross posting should go promptly to hell if 
I have any say in the matter).

On Mon, Oct 01, 2012 at 05:58:43PM -0700, Diego Elio Petten?? wrote:
> On 01/10/2012 17:51, Gregory M. Turner wrote:
> > 
> > Anyhow, I get it: administering the vcs for a huge project such as
> > Gentoo is very hard work.  If I somehow gave some other impression, I'm
> > sorry.   Perhaps Rich and I insensitively voiced our shared assumption
> > that Gentoo's continued reliance on cvs stems from a lack of motivation
> > and consensus, rather than a shortage of labor and resources. 
> 
> That's definitely not the case. While we do have had some complains
> (mostly from Prefix last I knew) about git's working, the consensus for
> going to git is there. The problems are vastly technical.
> 
> Problems such as "how many developers would be fine with having to
> checkout 2GB of history to be able to commit"? git support shallow
> clones but not if you want to commit to them.

Few corrections;
1) You can commit to shallow clones.  You can actually push from them 
too- you just have to know what you're doing (your parent *has* to be 
known to the other side, else you're trying to push a disconnected 
history/graph to the other side, which doesn't know how to connect the 
two).  We won't be doing that fortunately, just noting that it is 
possible if you're careful (and I know what the man page says; what 
I'm saying is the full version, rather than the short version they 
list there).

2) graft's are what we'll be doing there; kind of shallow, but now.  
Basically the same thing the kernel folk did.

As for the "quit your bitching and contribute already" rant angle; 
Diego's accurate; minimally, it's more productive to contribute and 
you're less likely to crap on folks motivation, let alone risk the 
wraith of a pissy person like me yelling at you.

Here in is the kicker; certain chunks of this can't be handled by 
random joe blow off the street- they require core infra access.  

Bluntly (no disrespect to people, just being brutally direct) I don't 
care if you have infra friends, I don't care if you maintain a couple 
of boxes; if you're doing heavy OPs in a production environment, 
you'll understand the issue of trust/access- thus you'll understand 
that some of this work, cannot be done by anyone but infra.

Like it or not, very few people have access to the core cvs -> rsync 
hosts/machinery- since each/every/one/of/us means it's a security 
angle that has to be tracked.  That's not arguable, so don't even try 
please.

That said, there are non-infra contributions people can make.

I suggest people do that; here's the list off the top of my head 
(these are things worst case, I'll sort- which means it'll be months 
out till I finish them considering my own time constraints and focus 
on getting eapi5 support into pkgcore first).

0) First the rules of the road for this discussion; assume that I'll 
be bitchy if you violate this.

0.a) We're not dropping the existing history.  Suggesting this is 
asking for a killfile entry, it's viable for small or throw-away 
projects; gentoo-x86 cvs repository is not a throw-away project.

0.b) Lesser offence since it's not obvious; the various suggestions 
that we just snapshot this, then try to fix history after the fact 
won't work- look into git's transitive trust via sha1's of the 
parent's sha1.  To do that sort of proposal means forcing a full 
history rewrite down the line; this doesn't fly.

0.c) For whatever I've missed, assume that if it craps on developers 
workflow... it's a no go, and needs to be addressed.  Does CVS suck?  
Yes, I hate having to use it.  But it *works*; switching to git has to 
be, minimally, a lateral move for developers in terms of their 
workflow- we cannot make it worse else what's the point of this whole 
exercise?  There may be an exception or two here- things that aren't 
sorted immediately upon conversion, but those exceptions will only fly 
if they're minor, don't require history rewrites, and someone is 
locked in/guranteed to be working on it now (else we have no gurantee 
it'll actually be sorted).

1) We need a thin manifest -> thick manifest converter.  Thin 
manifests are used for git- they store just DIST entries.  Thick (also 
known as 'full'), are what cvs/rsync users are familiar with- it holds 
checksums for all content.

1.a) This converter must use portage api's; ultimately, this 
thin->thick conversion will be signed by an infra key (rather than the 
current hodgepodge of devs).  I suggest nesting it under the emaint 
command.

1.b) This converter needs to be fast.  $VCS -> rsync updates occur 
every 30 minutes.  thin/thick sorting should be sub minute, frankly; 
go parallel (multiprocessing) being my suggestion, threadpool worst 
case (since most of the work won't be gil bound).

1.c) This absolutely has to be fucking stable.  This will be a core 
part of our infrastructure after all.

1.d) I will kneecap the first person who whines about portage on this, 
or suggests NIH "lets just hack it"- they won't have to support it, 
this goes into portage so it's proper, and so infra isn't stuck w/ 
more custom code.

1.e) This actually isn't that hard.  Ask in #gentoo-portage for 
details, look at portage source, look at repoman's existing manifest 
command- that manifest command already is the basics of it.

1.5) Incremental signing of a tree is basically required; meaning 
whatever scanner there is, shouldn't require resigning every single 
package, only those that have changed thick manifest wise.

1.6) Anyone looking to do this should pop into #gentoo-portage, talk 
w/ a user named 'carebear', zmedico, etc; zmedico is portage's 
maintainer, carebear is the current person volunteering to sort this 
(help may be appreciated, talk to him/her/it).

2) Building off of #1, although *NOT REQUIRED FOR CVS->GIT MIGRATION*, 
just very strongly desired, is sorting tree signing gleps while we're 
at it.  Start from http://www.gentoo.org/proj/en/glep/glep-0057.html ; 
whatever solution #1 takes (likely an emaint command), tree signing 
will be built right smack dab into it.

3) Robin afaik is putting together an email with the details; roughly, 
the conversion process is conversion of cvs to svn, then svn2git 
conversion; this is done since frankly it's the best/sanest conversion 
pathway, and the fastest.  The validation of that conversion, and 
getting it down to basically a set of known invocations is required.

3.a) Roughly, the plan will be snag the tree, start conversion.  
Validate the results, repeat as necessary till we're happy with it.  
This is the initial git core history,  This step should be <8h; mostly 
cpu time, frankly, although re-validation of that pathway is required 
(I did a fair amount of optimization to this, but I've not rechecked 
the runtime in a while- nor if there is a better option in existence).  
Basically, it's strongly preferable we're not sorting this at the time 
we're trying to do the live conversion- the core issues need to be 
sorted before.

3.b) Take all cvs activity that has occurred since the tree was 
snapshotted and conversion started, and replay it into git via tailor; 
this is minor- and avoidable if we just shut the tree down for however 
long 3.a takes; that said, the tailor route is the intention, and 
shouldn't be a problem.

4) People who strongly know git hooks would be useful; server side, 
all incoming pushes from devs will have their commits validated before 
touching the tree- bad validation, commit gets kicked back to them.  
The hooks for this need doing (development of this can be done locally 
w/out having to access infra either).  Hell, someone may already have 
done something similar- I've not seen it, but we need something akin 
to this; whoever does this, needs to write it such that the auth 
backend is configurable (upon deployment, this will be bound into 
ldap, or an ldap scraped set of data that it'll consult); assume that 
the auth backend will be user->gpg key level of validation (meaning I 
cannot take a random commit antarus had against current ToT, and push 
that on his behalf- robin may disagree on this point however).

Were that to be done, that would leave for infra basically the 
following- which is most definitely not a complete list-

1) gitolite configuration/setup, which afaik is basically sorted.
2) cvs -> rsync pathways being rebuilt to be git -> rsync (reliant on 
#1 from above, but there is more that occurs there).
3) Thanking people for stepping up and helping to take care of the 
stuff we're seriously low on time to sort.

People don't step up, I'll be working my way through that list; that 
said, my timetable were I to do this isn't "next week or the week 
after"- it's "over the next few months as time allows".

Also, it's entirely possible I missed something for the non-infra 
tasks people can contribute to; that's just a quick brain dump, pardon 
any incorrect statements.  If one has questions and answers aren't 
coming through via the scm ml, then worst case track me down on 
freenode via the ferringb nick; just assume I'll be wickedly laggy 
in responding.

Finally, pardon the strong tones; the tone in use isn't meant to 
dissuade people from contributing, it's meant to ensure people stay 
focused on what's required here to get the job done- discussions about 
building a git mirroring tier (for example) are for *after* the 
initial work is done (understand that 99% of users will be using rsync 
even when we switch dev's underlying vcs got git; longer term that may 
change, but it's a v2 type thing, not a v1 type thing).

Cheers-
~harring

[gentoo-dev] CVS -> git, list of where non-infra folk can contribute

Reply via email to