On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stew...@flamingspork.com> 
wrote:
> Using fast-import is interesting. Does it update the working tree? The
> big thing I wanted to avoid was creating a working tree (another million
> inodes being created is not ever what I need)
> 
> Also interesting is the mention of creating packs on the fly... this
> could save the time in first writing the object and then packing it (as
> my script does).
> 
> I'm going to play with this....

and I did.

good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'
takes):

using the (attached) evenless.pl to create a single commit with
everything in it:

$ du -sh .git
3.4G    .git

Down from a whopping 14-15GB!!!

My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.

This took only 108 minutes.

In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).

git-ls-tree and git-cat-file both work for listing and getting objects.

The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a "revision history" of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).

however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)

Deleting could be hard.. if we actually want the objects to go away in a
"permanent" way (not just no longer be referenced).

for the stats nerds:

$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX

git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     785000
Total objects:       781813 (     79023 duplicates                  )
      blobs  :       781363 (     79023 duplicates     708627 deltas)
      trees  :          449 (         0 duplicates          0 deltas)
      commits:            1 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:        1048576 (    860386 unique    )
      atoms:         860557
Memory total:        182780 KiB
       pools:        152116 KiB
     objects:         30664 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =          1
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  388496447 /  388496447
---------------------------------------------------------------------


real    107m43.130s
user    45m25.430s
sys     2m49.440s


#!/usr/bin/perl -w

use strict;

my $tree= "";

use IPC::Open2;

use File::stat;

my $FILES;

my $mark= 1;

my $stripdir= $ARGV[0];

sub fastimport_blobs ($);
sub fastimport_blobs ($)
{
    my $dirname= shift @_;

    opendir (my $dirhandle, $dirname);
    foreach (readdir $dirhandle)
    {
	next if /^\.\.?$/;
	next if /\.cmeta$/;
	next if /\.ibex.index$/;
	next if /\.ibex.index.data$/;
	next if /\.ev-summary$/;
	next if /\.ev-summary-meta$/;
	next if /\.notmuch$/;

	if (-d $dirname.'/'.$_)
	{
	    print STDERR "Recursing into $_/ ";
	    fastimport_blobs($dirname.'/'.$_);
	    print STDERR "\n";
	}
	else
	{
	    my $sb= stat("$dirname/$_");
	    print FASTIMPORT "blob\n";
	    print FASTIMPORT "mark :$mark\n";
	    print FASTIMPORT "data ".($sb->size)."\n";
	    open FILEIN, "$dirname/$_";
	    my $content;
	    sysread FILEIN, $content, $sb->size;
	    close FILEIN;
	    print FASTIMPORT $content;
	    my $storedir= "$dirname/$_";
	    $storedir=~ s/^$stripdir//;
	    $storedir=~ s/^\///;
	    $FILES.="M 0644 :$mark $storedir\n";
	    $mark++;
	}
    }
}

open FASTIMPORT, "| git fast-import --date-format=rfc2822";

fastimport_blobs($ARGV[0]);

print FASTIMPORT "commit refs/heads/master\n";
print FASTIMPORT "committer EvenLess <evenle...@evenless> ".`date -R`;
print FASTIMPORT "data 11\n";
print FASTIMPORT "mail commit\n";
print FASTIMPORT $FILES;
print FASTIMPORT "\n";

close FASTIMPORT;



-- 
Stewart Smith
_______________________________________________
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch

Reply via email to