On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stew...@flamingspork.com> wrote: > Using fast-import is interesting. Does it update the working tree? The > big thing I wanted to avoid was creating a working tree (another million > inodes being created is not ever what I need) > > Also interesting is the mention of creating packs on the fly... this > could save the time in first writing the object and then packing it (as > my script does). > > I'm going to play with this....
and I did. good news... on my mailstore (which, as I've previously mentioned, takes about 10 minutes to run 'du' over, about the same time as 'notmuch new' takes): using the (attached) evenless.pl to create a single commit with everything in it: $ du -sh .git 3.4G .git Down from a whopping 14-15GB!!! My previous effort (git-write-object, create pack every 1000 messages, rinse, repeat) took all night and got to 3.7GB. This took only 108 minutes. In both cases, i was creating the repository on another spindle (USB2.0 disk attached to my laptop). git-ls-tree and git-cat-file both work for listing and getting objects. The next thing to think about is adding objects as they come in... creating a new commit with just an added file should be pretty simple and easy... but this means we get to keep a "revision history" of the mailstore, which is *possibly* not ideal in terms of storage efficiency (i'll do a trial with mine of doing one message at a time and seeing what the end size is). however... commit per added mail (or mails) does give us the advantage of a really well documented and tested backup system :) Deleting could be hard.. if we actually want the objects to go away in a "permanent" way (not just no longer be referenced). for the stats nerds: $ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX git-fast-import statistics: --------------------------------------------------------------------- Alloc'd objects: 785000 Total objects: 781813 ( 79023 duplicates ) blobs : 781363 ( 79023 duplicates 708627 deltas) trees : 449 ( 0 duplicates 0 deltas) commits: 1 ( 0 duplicates 0 deltas) tags : 0 ( 0 duplicates 0 deltas) Total branches: 1 ( 1 loads ) marks: 1048576 ( 860386 unique ) atoms: 860557 Memory total: 182780 KiB pools: 152116 KiB objects: 30664 KiB --------------------------------------------------------------------- pack_report: getpagesize() = 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr = 1 pack_report: pack_mmap_calls = 1 pack_report: pack_open_windows = 1 / 1 pack_report: pack_mapped = 388496447 / 388496447 --------------------------------------------------------------------- real 107m43.130s user 45m25.430s sys 2m49.440s
#!/usr/bin/perl -w use strict; my $tree= ""; use IPC::Open2; use File::stat; my $FILES; my $mark= 1; my $stripdir= $ARGV[0]; sub fastimport_blobs ($); sub fastimport_blobs ($) { my $dirname= shift @_; opendir (my $dirhandle, $dirname); foreach (readdir $dirhandle) { next if /^\.\.?$/; next if /\.cmeta$/; next if /\.ibex.index$/; next if /\.ibex.index.data$/; next if /\.ev-summary$/; next if /\.ev-summary-meta$/; next if /\.notmuch$/; if (-d $dirname.'/'.$_) { print STDERR "Recursing into $_/ "; fastimport_blobs($dirname.'/'.$_); print STDERR "\n"; } else { my $sb= stat("$dirname/$_"); print FASTIMPORT "blob\n"; print FASTIMPORT "mark :$mark\n"; print FASTIMPORT "data ".($sb->size)."\n"; open FILEIN, "$dirname/$_"; my $content; sysread FILEIN, $content, $sb->size; close FILEIN; print FASTIMPORT $content; my $storedir= "$dirname/$_"; $storedir=~ s/^$stripdir//; $storedir=~ s/^\///; $FILES.="M 0644 :$mark $storedir\n"; $mark++; } } } open FASTIMPORT, "| git fast-import --date-format=rfc2822"; fastimport_blobs($ARGV[0]); print FASTIMPORT "commit refs/heads/master\n"; print FASTIMPORT "committer EvenLess <evenle...@evenless> ".`date -R`; print FASTIMPORT "data 11\n"; print FASTIMPORT "mail commit\n"; print FASTIMPORT $FILES; print FASTIMPORT "\n"; close FASTIMPORT;
-- Stewart Smith
_______________________________________________ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch