On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stew...@flamingspork.com> 
> Using fast-import is interesting. Does it update the working tree? The
> big thing I wanted to avoid was creating a working tree (another million
> inodes being created is not ever what I need)
> Also interesting is the mention of creating packs on the fly... this
> could save the time in first writing the object and then packing it (as
> my script does).
> I'm going to play with this....

and I did.

good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'

using the (attached) evenless.pl to create a single commit with
everything in it:

$ du -sh .git
3.4G    .git

Down from a whopping 14-15GB!!!

My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.

This took only 108 minutes.

In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).

git-ls-tree and git-cat-file both work for listing and getting objects.

The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a "revision history" of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).

however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)

Deleting could be hard.. if we actually want the objects to go away in a
"permanent" way (not just no longer be referenced).

for the stats nerds:

$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX

git-fast-import statistics:
Alloc'd objects:     785000
Total objects:       781813 (     79023 duplicates                  )
      blobs  :       781363 (     79023 duplicates     708627 deltas)
      trees  :          449 (         0 duplicates          0 deltas)
      commits:            1 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:        1048576 (    860386 unique    )
      atoms:         860557
Memory total:        182780 KiB
       pools:        152116 KiB
     objects:         30664 KiB
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =          1
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  388496447 /  388496447

real    107m43.130s
user    45m25.430s
sys     2m49.440s

#!/usr/bin/perl -w

use strict;

my $tree= "";

use IPC::Open2;

use File::stat;

my $FILES;

my $mark= 1;

my $stripdir= $ARGV[0];

sub fastimport_blobs ($);
sub fastimport_blobs ($)
    my $dirname= shift @_;

    opendir (my $dirhandle, $dirname);
    foreach (readdir $dirhandle)
	next if /^\.\.?$/;
	next if /\.cmeta$/;
	next if /\.ibex.index$/;
	next if /\.ibex.index.data$/;
	next if /\.ev-summary$/;
	next if /\.ev-summary-meta$/;
	next if /\.notmuch$/;

	if (-d $dirname.'/'.$_)
	    print STDERR "Recursing into $_/ ";
	    print STDERR "\n";
	    my $sb= stat("$dirname/$_");
	    print FASTIMPORT "blob\n";
	    print FASTIMPORT "mark :$mark\n";
	    print FASTIMPORT "data ".($sb->size)."\n";
	    open FILEIN, "$dirname/$_";
	    my $content;
	    sysread FILEIN, $content, $sb->size;
	    close FILEIN;
	    print FASTIMPORT $content;
	    my $storedir= "$dirname/$_";
	    $storedir=~ s/^$stripdir//;
	    $storedir=~ s/^\///;
	    $FILES.="M 0644 :$mark $storedir\n";

open FASTIMPORT, "| git fast-import --date-format=rfc2822";


print FASTIMPORT "commit refs/heads/master\n";
print FASTIMPORT "committer EvenLess <evenle...@evenless> ".`date -R`;
print FASTIMPORT "data 11\n";
print FASTIMPORT "mail commit\n";
print FASTIMPORT "\n";


Stewart Smith
notmuch mailing list

Reply via email to