Thanks for that information -- I didn't know about clone --depth. Well, in this particular case, once the git daemon was configured correctly, I was able to make git archive work just fine, so that scalability issue is resolved. git archive is very fast, since there's no history involved.
However, I do agree that git as the primary software distribution mechanism is not a very good design. Among my many (too many) side projects, I'm working on some post-receive hooks to automatically publish the deploy-config data as timestamped tarballs (these git repos are a little special). I have reworked the code that manages the git repos, and once I get that deployed, I'll be able to introduce some more hooks to do creative things. The efsdeploy_config_update script, which right now just supports downloading via git, will be extended to support downloading the published releases as well. If I may fantasize about actually having some users some day (pause to stare the window and dream....) I would expect sites that are actively involved in EFS development to use the git based method, but sites that just want to use the published, published data would use the tarballs. In any event, the mechanism's evolving very rapidly, and I'm still in the experimental proof of concept phase with respect to managing the efsdeploy config rules. I think this model is going to work, but time will tell... On Wed, Jan 4, 2012 at 11:11 AM, Kevin Green <[email protected]> wrote: > Hi, > > Well, you could potentially work around the scale issue by using a > shallow clone (--depth 1). I'll use git.git as an example since you > have a small history repo with examples below: > > $ time git clone --depth 1 git://github.com/git/git.git > Cloning into 'git'... > remote: Counting objects: 26189, done. > remote: Compressing objects: 100% (13591/13591), done. > remote: Total 26189 (delta 21763), reused 15849 (delta 12193) > Receiving objects: 100% (26189/26189), 9.05 MiB | 1.13 MiB/s, done. > Resolving deltas: 100% (21763/21763), done. > > real 0m16.753s > user 0m3.931s > sys 0m0.609s > $ cd git > [master]$ git log > commit 4570aeb0d85f3b5ff274b6d5a651c2ee06d25d76 > Merge: 228c341 28755db > Author: Junio C Hamano <[email protected]> > Date: Tue Jan 3 14:09:28 2012 -0800 > > Merge branch 'pw/p4-docs-and-tests' > > * pw/p4-docs-and-tests: > git-p4: document and test submit options > git-p4: test and document --use-client-spec > git-p4: test --keep-path > git-p4: test --max-changes > git-p4: document and test --import-local > git-p4: honor --changesfile option and test > git-p4: document and test clone --branch > git-p4: test cloning with two dirs, clarify doc > git-p4: clone does not use --git-dir > git-p4: introduce asciidoc documentation > rename git-p4 tests > > commit 228c3418356d06d0596408bee1c863e53ca27d58 > Author: Junio C Hamano <[email protected]> > Date: Tue Jan 3 13:48:00 2012 -0800 > > Merge branch 'maint' > > * maint: > docs: describe behavior of relative submodule URLs > fix hang in git fetch if pointed at a 0 length bundle > Documentation: read-tree --prefix works with existing subtrees > Add MYMETA.json to perl/.gitignore > > commit 28755dbaa5213032b2da202652c214a9f94ff853 > Author: Pete Wyckoff <[email protected]> > Date: Sat Dec 24 21:07:40 2011 -0500 > > git-p4: document and test submit options > > Clarify there is a -M option, but no -C. These are both > configurable through variables. > > Explain that the allowSubmit variable takes a comma-separated > list of branch names. > > Catch earlier an invalid branch name given as an argument to > "git p4 clone". > > Test option --origin, variable allowSubmit, and explicit master > branch name. > > Signed-off-by: Pete Wyckoff <[email protected]> > Signed-off-by: Junio C Hamano <[email protected]> > > > > And then compare that with the time to check out the full repo: > > [master]$ cd .. > $ rm -rf git > $ time git clone git://github.com/git/git.git > Cloning into 'git'... > remote: Counting objects: 127389, done. > remote: Compressing objects: 100% (41918/41918), done. > remote: Total 127389 (delta 92731), reused 117665 (delta 83665) > Receiving objects: 100% (127389/127389), 27.95 MiB | 1.35 MiB/s, done. > Resolving deltas: 100% (92731/92731), done. > > real 0m46.661s > user 0m14.107s > sys 0m1.865s > > > Since you don't care about the history in your use case, you can use a shallow > clone to pull down the least amount of data necessary... > > > > I think the idea of providing a tarball on the server side is the way to go > though... git really is a distributed code management tool meant for keeping > track of change. It's not ideally suited for pure distribution. Use the > simple > git-archive (which also will do the gzip compression for you) on the backend, > auto-generated by a git hook whenever code is updated there and just pull that > down to the client. > > --Kevin > > On 01/04/12 10:23:42, Phillip Moore wrote: >> Well, "git archive" comes very close to what we want, but it only >> works against remote repositories using ssh, so that's not going to >> work for any of the real world sites that are using (or hopefully will >> soon be using) EFS. >> >> This really seems like a short coming in git, really. if you can >> anonymously clone an entire repo, it should be easy to get just a >> working directory for the HEAD of master anonymously, too. >> >> I think we need to come up with a mechanism for auto-generating a >> "latest" tarball for each of these via a commit hook, so I'll go take >> a look at the code Jerry wrote to implement the hooks we have today, >> and see how we extend that to add a new one. The creation of the >> tarball will end up being a VERY short script, since a one-liner with >> git/gzip can create it. >> >> On Wed, Jan 4, 2012 at 9:55 AM, Phillip Moore <[email protected]> >> wrote: >> > This is a great idea except that I have no clue how git works, >> > obviously.... >> > >> > I had confused "git checkout" with "svn export", and now that I look, >> > I can't find a way to accomlish this after all. What I wanted might >> > not be possible with git -- namely a way to download the repo, and >> > just get a working tree with no repo metadata. >> > >> > What I want is the equivalant result of "svn export", which gives you >> > HEAD of your SVN repo, without all the .svn dirs. >> > >> > Now, obviously, you can do this: >> > >> > git clone $url . >> > rm -r .git >> > >> > But that will NEVER scale, as the size of the git history grows. >> > >> > Maybe the better mechanism is to have a commit hook which does this, >> > and publishes a tarball on ftp.openefs.org with a "latest" symlink. >> > Then the code can use wget and tar to achieve this goal, rather than >> > using git directly. >> > >> > If one of you knows of a means to do this using git, directly, please >> > let me know. I will continue researching this... >> > >> > On Wed, Jan 4, 2012 at 8:28 AM, Phillip Moore <[email protected]> >> > wrote: >> >> I came up with an alternate way to manage deploying these >> >> deploy-config projects, that will make it trivial to keep them >> >> uptodate, AND deal with the fact that we're managing them in multiple >> >> repos. >> >> >> >> First of all, for flexibility, I'm still going to implement the search >> >> mechanism for the efsdeploy directory as I described before. However, >> >> based on the way I've structured the git repos, you can actually do a >> >> "git checkout" and drop them all into the same root directory? >> >> >> >> I'm going to try this today, since it so damn simple. >> >> efsdeploy_config_update will be the script that does the following: >> >> >> >> efs create autorelease efs deploy-config >> >> cd /efs/dev/efs/deploy-config/next/install/common >> >> git checkout http://git.openefs.org/deploy-config >> >> git checkout http://git.openefs.org/deploy-config-aix >> >> git checkout http://git.openefs.org/deploy-config-gnu >> >> .... >> >> efs dist autorelease efs deploy-config >> >> >> >> Now, you have /efs/dist/efs/deploy-config/current/common with ALL of >> >> the published git configs. >> >> >> >> Note that because ALL of these repos are structures with a >> >> metaproj/project structure, they can ALL co-exist in the same >> >> directory tree (if you use checkout, I think -- I haven't tried this >> >> yet, but since you don't get the .git directory, I don't see why this >> >> won't work -- I'll figure out how to make it work :-P) >> >> >> >> Even better, we can drop a simple file into the root of each repo, >> >> giving the name of the "child" repos in the obvious hierarchy here. >> >> For example, in the root of deploy-config, the contents of >> >> subrepos.txt might be: >> >> >> >> deploy-config-aix >> >> deploy-config-gnu >> >> deploy-config-rhel >> >> deploy-config-sunos >> >> >> >> The subrepos.txt file in deploy-config-gnu will have to live in the >> >> gnu subdir, to avoid clashes, but then, since the top tells us to >> >> checkout deploy-config-gnu, we then know to look for the next >> >> subrepos.txt file in ./gnu. This will then contain: >> >> >> >> deploy-config-gnu-gcc >> >> deploy-config-gnu-gcclib >> >> >> >> This will give us the full flexibility of an easy to use, well managed >> >> default (you only get the published, commited master branch), with the >> >> ability to create and manage your own local repos as well. For >> >> example, there will never be an "fsf" metaproj in the OpenEFS >> >> namespace, and in practice, you've going to be migrating stuff to gnu, >> >> I assume, but if you wanted to maintain your own deploy-config-fsf git >> >> repo, that works fine. You would simply manage it in: >> >> >> >> /efs/dist/fsf/deploy-config-fsf >> >> >> >> I can even support publishing this using efsdeploy_config_update via >> >> CLI args, if you wanted to use the same, simple mechanism. >> >> >> >> This is starting to come together very nicely, and now all we really >> >> need are.... >> >> >> >> Users :-( >> >> >> >> On Fri, Dec 30, 2011 at 12:57 PM, Phillip Moore >> >> <[email protected]> wrote: >> >>> On Fri, Dec 30, 2011 at 12:09 PM, Phillip Moore >> >>> <[email protected]> wrote: >> >>>> More thoughts, and some significant progress in this area.... >> >>>> >> >>>> I spent most of yesterday collecting the efsedploy rules for >> >>>> EVERYTHING I've built into /efs/dist over the last few months (it's a >> >>>> lot), by copying the src directory to: >> >>>> >> >>>> ~/dev/efs/deploy-config/$metaproj/$project >> >>> >> >>> OK, so once everything in that directory has been sanitized of ALL >> >>> site-specific information, then we have to figure out how to manage >> >>> it. Here's what I'm currently thinking, although this is going to >> >>> evolve, of course. >> >>> >> >>> First of all, note that efsdeploy is going to start whining at you to >> >>> switch from efs/deploy-config to efs/deploy-site, because I want to >> >>> use the name deploy-config for all of this data. Deal with it.... >> >>> It's *trivial* to switch, and takes about 5-10 minutes, if you type >> >>> slow. >> >>> >> >>> I want to create 3 types of git repo to manage this data: >> >>> >> >>> deploy-config-$metaproj-$project.git >> >>> deploy-config-$metaproj.git >> >>> deploy-config.git >> >>> >> >>> For things like gnu/gcc, we'll obviously create a project-specific git >> >>> repo, and for large metaprojs where we expect a lot of similarity >> >>> among the projects, we can create metaproj-specific ones. The >> >>> default, global git repo would contain all the small, simple stuff, >> >>> like oss/zlib. For starters, I expect to create these: >> >>> >> >>> deploy-config-gnu-gcc.git (which will be used for rhel/gcc as well) >> >>> deploy-config-gnu-gcclib.git (also for rhel/gcclib) >> >>> deploy-config-gnu.git >> >>> deploy-config-perl5-core.git >> >>> deploy-config-perl5.git >> >>> deploy-config-apache.git (might get it's own system, too -- we'll >> >>> see...) >> >>> >> >>> And of course the generic one. What I like about this is we always >> >>> migrate things from one to the other pretty easily. if we find that, >> >>> say oss/openssl has grown complex enough, we can yank it out of >> >>> deploy-config, and create deploy-config-oss-openssl. >> >>> >> >>> So how do we deploy this data? Having it well managed is git is >> >>> great, but how to we access it when building things with efsdeploy, >> >>> and where does it get copied/cached? >> >>> >> >>> Let's start with the generic repo first. Just as we use >> >>> efs/deploy-site/current to abstract the site-specific config >> >>> information, I think we can do the following: >> >>> >> >>> deploy-config.git => /efs/dist/efs/deploy-config/current >> >>> >> >>> The metaproj- and project-specific ones would then map to: >> >>> >> >>> deploy-config-$metaproj.git => >> >>> /efs/dist/$metaproj/deploy-config-$metaproj/current >> >>> deploy-config-$metaproj-project.git => >> >>> /efs/dist/$metaproj/deploy-config-$metaproj-$project/current >> >>> >> >>> This would allow us to publish, probably date-based, any of these >> >>> repositories with the "latest" set of efsdeploy build rules. >> >>> Note that the default rules go into the efs metaproj, obviously, but >> >>> we can still have a "deploy-config-efs.git" repo if we want, with no >> >>> conflict. >> >>> >> >>> It is very straight forward to code a solution that allows us to >> >>> automate keeping the local copies of these rules uptodate as they >> >>> change. I will almost certainly have a first pass at this within the >> >>> next month. However, what is NOT clear is just how to use this >> >>> information in efsdeploy when building release. >> >>> >> >>> Reproducibility concerns me. The rules are going to evolve, and when >> >>> we make gnu/gcc rule changes to build, say 4.7.0, we don't want to >> >>> break builds of 4.4.6, and yet *testing* that is extremely expensive. >> >>> For that reason, I think the contents of the efsdeploy directory >> >>> should be CACHED in the release, rather than read from these projects >> >>> during the build. Just as we are going to provide generic dependency >> >>> specs (see email from 30 minutes ago), and expanding those into >> >>> specific releasealiases to be used for the duration of the build, I >> >>> think we should do the same for the project-specific build rules, or >> >>> at least make it optional. >> >>> >> >>> In theory, if we just have efsdeploy search for these rules the same >> >>> way it searches for system-specific (i,e, gnu, perl5, etc) rules, and >> >>> then site-specific rules, then I could actually build EVERYTHING I >> >>> have in /efs/dist with EMPTY source directories!! If a project is >> >>> supported by one of these repos, then you can build a new release with >> >>> nothing more than: >> >>> >> >>> efs create project ... >> >>> efs create release ... >> >>> cd ..../src >> >>> efsdeploy down:up >> >>> >> >>> The contents of the src directory would contain NOTHING but the >> >>> changes you had to make (hooks, configs, whatever) to get the release >> >>> to build. Those changes should then be re-integrated with the git >> >>> repo in a controlled fashion, so that the next person building that >> >>> MPR has no pain. The specific workflow for how a new change gets >> >>> rolled into the published git repos will need to be worked out, but I >> >>> think that will be straight forward. >> >>> >> >>> Now, obviously, in order to *develop* changes to the rules, we'll need >> >>> a simply means of overriding the path to these published rules. >> >>> Maybe you want to install the latest set of gnu/gcc rules, but not >> >>> make them current until you've actually done a test-build of the >> >>> releases you care about. Maybe something in efsdeploy.conf (which >> >>> will now be a site/release-specific file, by definition) like this. >> >>> Say we wanted to test out some local changes right from the source >> >>> tree (I've been doing this with symlinks for now): >> >>> >> >>> [rules] >> >>> $metaproj/$project = /home/efsops/dev/efs/deploy-config-gnu-gcc >> >>> >> >>> or, perhaps, if we use date-based releases, you could install the >> >>> latest update into /efs/dist, and test it out this way: >> >>> >> >>> [rules] >> >>> $metaproj/$project = /efs/dist/gnu/deploy-config-gnu-gcc/20111230 >> >>> >> >>> Alternately, you could just rsync the efsdeploy directory right into a >> >>> release, and work with a copy. >> >>> >> >>> OK, that's enough of Phil's rantings for one day. Not that anyone's >> >>> paying attention, but you will see commits that implement many of >> >>> these features over the next few weeks. >> _______________________________________________ >> EFS-dev mailing list >> [email protected] >> http://mailman.openefs.org/mailman/listinfo/efs-dev > _______________________________________________ > EFS-dev mailing list > [email protected] > http://mailman.openefs.org/mailman/listinfo/efs-dev _______________________________________________ EFS-dev mailing list [email protected] http://mailman.openefs.org/mailman/listinfo/efs-dev
