Cross-posting to scm; responses should go to scm please (and the people who whinge about cross posting should go promptly to hell if I have any say in the matter).
On Mon, Oct 01, 2012 at 05:58:43PM -0700, Diego Elio Petten?? wrote: > On 01/10/2012 17:51, Gregory M. Turner wrote: > > > > Anyhow, I get it: administering the vcs for a huge project such as > > Gentoo is very hard work. If I somehow gave some other impression, I'm > > sorry. Perhaps Rich and I insensitively voiced our shared assumption > > that Gentoo's continued reliance on cvs stems from a lack of motivation > > and consensus, rather than a shortage of labor and resources. > > That's definitely not the case. While we do have had some complains > (mostly from Prefix last I knew) about git's working, the consensus for > going to git is there. The problems are vastly technical. > > Problems such as "how many developers would be fine with having to > checkout 2GB of history to be able to commit"? git support shallow > clones but not if you want to commit to them. Few corrections; 1) You can commit to shallow clones. You can actually push from them too- you just have to know what you're doing (your parent *has* to be known to the other side, else you're trying to push a disconnected history/graph to the other side, which doesn't know how to connect the two). We won't be doing that fortunately, just noting that it is possible if you're careful (and I know what the man page says; what I'm saying is the full version, rather than the short version they list there). 2) graft's are what we'll be doing there; kind of shallow, but now. Basically the same thing the kernel folk did. As for the "quit your bitching and contribute already" rant angle; Diego's accurate; minimally, it's more productive to contribute and you're less likely to crap on folks motivation, let alone risk the wraith of a pissy person like me yelling at you. Here in is the kicker; certain chunks of this can't be handled by random joe blow off the street- they require core infra access. Bluntly (no disrespect to people, just being brutally direct) I don't care if you have infra friends, I don't care if you maintain a couple of boxes; if you're doing heavy OPs in a production environment, you'll understand the issue of trust/access- thus you'll understand that some of this work, cannot be done by anyone but infra. Like it or not, very few people have access to the core cvs -> rsync hosts/machinery- since each/every/one/of/us means it's a security angle that has to be tracked. That's not arguable, so don't even try please. That said, there are non-infra contributions people can make. I suggest people do that; here's the list off the top of my head (these are things worst case, I'll sort- which means it'll be months out till I finish them considering my own time constraints and focus on getting eapi5 support into pkgcore first). 0) First the rules of the road for this discussion; assume that I'll be bitchy if you violate this. 0.a) We're not dropping the existing history. Suggesting this is asking for a killfile entry, it's viable for small or throw-away projects; gentoo-x86 cvs repository is not a throw-away project. 0.b) Lesser offence since it's not obvious; the various suggestions that we just snapshot this, then try to fix history after the fact won't work- look into git's transitive trust via sha1's of the parent's sha1. To do that sort of proposal means forcing a full history rewrite down the line; this doesn't fly. 0.c) For whatever I've missed, assume that if it craps on developers workflow... it's a no go, and needs to be addressed. Does CVS suck? Yes, I hate having to use it. But it *works*; switching to git has to be, minimally, a lateral move for developers in terms of their workflow- we cannot make it worse else what's the point of this whole exercise? There may be an exception or two here- things that aren't sorted immediately upon conversion, but those exceptions will only fly if they're minor, don't require history rewrites, and someone is locked in/guranteed to be working on it now (else we have no gurantee it'll actually be sorted). 1) We need a thin manifest -> thick manifest converter. Thin manifests are used for git- they store just DIST entries. Thick (also known as 'full'), are what cvs/rsync users are familiar with- it holds checksums for all content. 1.a) This converter must use portage api's; ultimately, this thin->thick conversion will be signed by an infra key (rather than the current hodgepodge of devs). I suggest nesting it under the emaint command. 1.b) This converter needs to be fast. $VCS -> rsync updates occur every 30 minutes. thin/thick sorting should be sub minute, frankly; go parallel (multiprocessing) being my suggestion, threadpool worst case (since most of the work won't be gil bound). 1.c) This absolutely has to be fucking stable. This will be a core part of our infrastructure after all. 1.d) I will kneecap the first person who whines about portage on this, or suggests NIH "lets just hack it"- they won't have to support it, this goes into portage so it's proper, and so infra isn't stuck w/ more custom code. 1.e) This actually isn't that hard. Ask in #gentoo-portage for details, look at portage source, look at repoman's existing manifest command- that manifest command already is the basics of it. 1.5) Incremental signing of a tree is basically required; meaning whatever scanner there is, shouldn't require resigning every single package, only those that have changed thick manifest wise. 1.6) Anyone looking to do this should pop into #gentoo-portage, talk w/ a user named 'carebear', zmedico, etc; zmedico is portage's maintainer, carebear is the current person volunteering to sort this (help may be appreciated, talk to him/her/it). 2) Building off of #1, although *NOT REQUIRED FOR CVS->GIT MIGRATION*, just very strongly desired, is sorting tree signing gleps while we're at it. Start from http://www.gentoo.org/proj/en/glep/glep-0057.html ; whatever solution #1 takes (likely an emaint command), tree signing will be built right smack dab into it. 3) Robin afaik is putting together an email with the details; roughly, the conversion process is conversion of cvs to svn, then svn2git conversion; this is done since frankly it's the best/sanest conversion pathway, and the fastest. The validation of that conversion, and getting it down to basically a set of known invocations is required. 3.a) Roughly, the plan will be snag the tree, start conversion. Validate the results, repeat as necessary till we're happy with it. This is the initial git core history, This step should be <8h; mostly cpu time, frankly, although re-validation of that pathway is required (I did a fair amount of optimization to this, but I've not rechecked the runtime in a while- nor if there is a better option in existence). Basically, it's strongly preferable we're not sorting this at the time we're trying to do the live conversion- the core issues need to be sorted before. 3.b) Take all cvs activity that has occurred since the tree was snapshotted and conversion started, and replay it into git via tailor; this is minor- and avoidable if we just shut the tree down for however long 3.a takes; that said, the tailor route is the intention, and shouldn't be a problem. 4) People who strongly know git hooks would be useful; server side, all incoming pushes from devs will have their commits validated before touching the tree- bad validation, commit gets kicked back to them. The hooks for this need doing (development of this can be done locally w/out having to access infra either). Hell, someone may already have done something similar- I've not seen it, but we need something akin to this; whoever does this, needs to write it such that the auth backend is configurable (upon deployment, this will be bound into ldap, or an ldap scraped set of data that it'll consult); assume that the auth backend will be user->gpg key level of validation (meaning I cannot take a random commit antarus had against current ToT, and push that on his behalf- robin may disagree on this point however). Were that to be done, that would leave for infra basically the following- which is most definitely not a complete list- 1) gitolite configuration/setup, which afaik is basically sorted. 2) cvs -> rsync pathways being rebuilt to be git -> rsync (reliant on #1 from above, but there is more that occurs there). 3) Thanking people for stepping up and helping to take care of the stuff we're seriously low on time to sort. People don't step up, I'll be working my way through that list; that said, my timetable were I to do this isn't "next week or the week after"- it's "over the next few months as time allows". Also, it's entirely possible I missed something for the non-infra tasks people can contribute to; that's just a quick brain dump, pardon any incorrect statements. If one has questions and answers aren't coming through via the scm ml, then worst case track me down on freenode via the ferringb nick; just assume I'll be wickedly laggy in responding. Finally, pardon the strong tones; the tone in use isn't meant to dissuade people from contributing, it's meant to ensure people stay focused on what's required here to get the job done- discussions about building a git mirroring tier (for example) are for *after* the initial work is done (understand that 99% of users will be using rsync even when we switch dev's underlying vcs got git; longer term that may change, but it's a v2 type thing, not a v1 type thing). Cheers- ~harring