Thank you Dale,

>  The problem is that adding a commit to the *beginning* of the chain 
requires a bit of work, because you have to recreate all of the later 
commits so they reference the first commit. 

Are you certain about this. At first pass reading through the git shallow.c 
code I am having the feeling that I will be able to avoid this with 
creating the objects with a shallow flag. Then when I need to add a parent 
I could just attach the parent and unregistered the object as shallow 
without a need to recreate it. (exactly as it seems --unshallow works)

Thanks,
George






On Friday, April 25, 2014 11:38:20 AM UTC-7, Dale Worley wrote:
>
> > From: George Georgiev <george.ge...@gmail.com <javascript:>> 
> > 
> > I am researching how I can convert file system with backup history into 
> a 
> > git repository. 
> > 
> > I would like to do this in phases. The first phase is to create a 
> shallow 
> > repo with only the head files. And then I would like to unshallow it 
> step 
> > by step. The goal is to have a valid git repo to start working with 
> asap. 
>
> The details depend on the specifics of your situation.  But as long as 
> you can create copies of the file trees which are the historical 
> snapshots, you can add them to the repository and string them together 
> to form a series of commits.  The problem is that adding a commit to 
> the *beginning* of the chain requires a bit of work, because you have 
> to recreate all of the later commits so they reference the first 
> commit.  I don't think there's a "porcelain" command to do that, you 
> have to use "plumbing" commands to recreate each commit in the chain 
> one by one. 
>
> I don't know of any references, but following is a short Perl program 
> I use to prune out some of the commits in a repository if they are too 
> closely spaced together compared to their time in the past.  It shows 
> how you go about recreating a chain of commits. 
>
> Dale 
>
> #! /bin/perl 
>
> use strict; 
>
> # Process the -d switch, which must have a numeric argument. 
> my($debug) = 0; 
> if ($ARGV[0] =~ /^-d([\d]+)$/) { 
>     $debug = $1; 
>     print STDERR "\$debug = $debug\n"; 
>     shift; 
> } 
> die "Unknown argument(s): ", join(' ', @ARGV) if $#ARGV >= 0; 
>
> # This is the rate at which commits are to be retained: 
> my($rate); 
> # At a time N in the past, commits should be spaced at most N/$rate 
> # apart. 
> # Thus, larger $rate values mean to keep more commits around. 
> # The rate is stored in the Git configuration as time-warp.rate. 
> # If the user entered it on the command line, it would be easier for 
> # the user to fumble-finger a small value and delete much of the 
> # history he wanted to save. 
> my($config_name) = 'time-warp.rate'; 
> my($command) = 'git config ' . $config_name; 
> chomp($rate = `$command`); 
> my($r) = $? >> 8; 
> if ($r != 0) { 
>     warn "Could not obtain Git configuration value '$config_name'.\n"; 
>     die "Error executing '$command': exit code $r\n" if $r; 
> } elsif ($rate !~ /^\d+$/ && $rate >= 1) { 
>     die "Rate value '$rate' is syntactically incorrect or less than 1.\n"; 
> } 
> print STDERR "\$rate = $rate\n" if $debug; 
>
> # Get the hashes and times of the commit history. 
> # Note that we are assuming that the current branch of the repository 
> # is the branch to be operated upon. 
> $command = "git log --pretty=tformat:'%H %ct'"; 
> print STDERR "\$command = $command\n" if $debug >= 3; 
> open(GIT, "-|", $command) || 
>     die "Error executing '$command' for input: $!\n"; 
> # Note that "git log" lists commits going back in time, so @hashes and 
> @times 
> # will describe the latest commits first. 
> my(@hashes, @times); 
> while (<GIT>) { 
>     chomp; 
>     my($hash, $time) = split; 
>     push(@hashes, $hash); 
>     push(@times, $time); 
>     print STDERR "\$hashes[", $#hashes, "] = $hash, \$times[", $#times, "] 
> = $time\n" if $debug >= 2; 
> } 
> close GIT || die "Error closing '$command': $!\n"; 
>
> # Get the "now" time, which is the time of the last commit. 
> my($now) = $times[0]; 
> print STDERR "\$now = $now\n" if $debug; 
>
> # Now, working from oldest to newest, look at each commit and decide 
> whether 
> # to recreate it. 
> # The last commit we've recreated and its time. 
> my($last_commit) = '0000000000000000000000000000000000000000'; 
> my($last_commit_time) = 0; 
> my($commits_created) = 0; 
> print STDERR "\$last_commit = $last_commit, \$last_commit_time = 
> $last_commit_time\n" 
>     if $debug; 
> # Cycle through the commits from the oldest to the newest, recreating 
> # the commit chain, retaining the commits we desire. 
> for (my $i = $#hashes; $i >= 0; $i--) { 
>     print STDERR "\$i = $i, \$hashes[$i] = $hashes[$i], \$times[$i] = 
> $times[$i]\n" 
>         if $debug; 
>     # Test if commit $i-1 (the next-newer commit than this one) is 
>     # close enough to $last_commit that we can omit creating a new 
>     # commit from this one, commit $i.  We always generate a new 
>     # commit from commit 0, which is the newest. 
>     if ($i > 0 && $debug) { 
>         print STDERR "\$times[", $i-1, "] = $times[$i-1], 
> \$last_commit_time = $last_commit_time, \$now = $now, \$last_commit_time = 
> $last_commit_time\n"; 
>         print STDERR $times[$i-1] - $last_commit_time, ' > ', ($now - 
> $times[$i-1]) / $rate, "\n"; 
>     } 
>     if ($i == 0 || 
>         $times[$i-1] - $last_commit_time >= ($now - $times[$i-1]) / $rate) 
> { 
>         print STDERR "Recreate commit $i: $hashes[$i]\n" if $debug; 
>         # Commit $i-1 (the next-newer commit than this one) is too far 
> from 
>         # the last new commit we created, so we have to create a new 
>         # commit out of commit $i. 
>         $last_commit = &create_new_commit($hashes[$i], $last_commit); 
>         $last_commit_time = $times[$i]; 
>         $commits_created++; 
>         print STDERR "\$last_commit = $last_commit, \$last_commit_time = 
> $last_commit_time\n" 
>             if $debug; 
>     } 
> } 
>
> # Set the HEAD to the new commit. 
> # Tell git to check that the current HEAD is $hashes[0] (what it was 
> previously), 
> # so that if the repository has been modified during our execution, we can 
> abort. 
> my(@command) = ('git', 'update-ref', 'HEAD', $last_commit, $hashes[0]); 
> system(@command); 
> $r = $? >> 8; 
> die "Error executing '" . join(' ', @command) . "': exit code $r\n" if $r; 
> print STDERR "system(" . join(' ', @command) . "): exit code $r\n" if 
> $debug; 
>
> # Update the index to match HEAD. 
> # Specify -q, because "git reset" can incorrectly report a lot of files as 
> # unstaged changes. 
> @command = ('git', 'reset', '-q', 'HEAD'); 
> system(@command); 
> $r = $? >> 8; 
> die "Error executing '" . join(' ', @command) . "': exit code $r\n" if $r; 
> print STDERR "system(" . join(' ', @command) . "): exit code $r\n" if 
> $debug; 
>
> print $#hashes+1, " previous commits, ", 
>     $commits_created, " commits were recreated, ", 
>     $#hashes-$commits_created+1, " commits were dropped\n"; 
>
> # Garbage collect the repository. 
> print "Garbage collecting and compressing the repository.\n"; 
> # Use "git gc --aggressive", as the memory consumed by repacking can 
> # be controlled by the pack.windowMemory configuration value. 
> @command = ('git', 'gc', '--quiet', '--aggressive'); 
> system(@command); 
> $r = $? >> 8; 
> die "Error executing '" . join(' ', @command) . "': exit code $r\n" if $r; 
> print STDERR "system(" . join(' ', @command) . "): exit code $r\n" if 
> $debug; 
>
> # Print out total space usage. 
> @command = ('du', '-sh', '.git'); 
> system(@command); 
> $r = $? >> 8; 
> die "Error executing '" . join(' ', @command) . "': exit code $r\n" if $r; 
> print STDERR "system(" . join(' ', @command) . "): exit code $r\n" if 
> $debug; 
>
> exit 0; 
>
> # Create a new commit. 
> # Takes as input the hash of the commit to recreate and the hash to be 
> # used as the parent of the new create. 
> # Returns the new commit hash. 
> sub create_new_commit { 
>     my($old_commit, $parent) = @_; 
>     print STDERR "\&create_new_commit('$old_commit', '$parent')\n" 
>         if $debug >= 3; 
>     print STDERR "\$old_commit = $old_commit, \$parent = $parent\n" 
>         if $debug >= 3; 
>
>     my($command) = "git cat-file -p $old_commit"; 
>     print STDERR "\$command = $command\n" if $debug >= 3; 
>     open(GIT, "-|", $command) || 
>         die "Error executing '$command' for input: $!\n"; 
>     # Parse the header part of the commit object. 
>     my($tree, $author_name, $author_email, $author_date, 
>        $committer_name, $committer_email, $committer_date); 
>     while (<GIT>) { 
>         chomp; 
>         print STDERR "\$_ = $_\n" if $debug >= 3; 
>         if ($_ eq '') { 
>             last; 
>         } elsif (/^tree (.*)/) { 
>             $tree = $1; 
>         } elsif (/^author (.*) <([^ ]*)> ([-+\d\s]*)$/) { 
>             $author_name = $1; 
>             $author_email = $2; 
>             $author_date = $3; 
>         } elsif (/^committer (.*) <([^ ]*)> ([-+\d\s]*)$/) { 
>             $committer_name = $1; 
>             $committer_email = $2; 
>             $committer_date = $3; 
>         } else { 
>             ; 
>         } 
>     } 
>     # The remainder of the commit object is the commit message. 
>     my(@commit_message) = <GIT>; 
>     print STDERR "\@commit_message = '", join("\n", @commit_message) 
> ,"'\n" 
>         if $debug >= 3; 
>     close GIT || die "Error closing '$command': $!\n"; 
>              
>     # Set up the environment for the author and committer information. 
>     $ENV{'GIT_AUTHOR_NAME'} = $author_name; 
>     print STDERR "GIT_AUTHOR_NAME = '$ENV{'GIT_AUTHOR_NAME'}'\n" 
>         if $debug >= 3; 
>     $ENV{'GIT_AUTHOR_EMAIL'} = $author_email; 
>     print STDERR "GIT_AUTHOR_EMAIL = '$ENV{'GIT_AUTHOR_EMAIL'}'\n" 
>         if $debug >= 3; 
>     $ENV{'GIT_AUTHOR_DATE'} = $author_date; 
>     print STDERR "GIT_AUTHOR_DATE = '$ENV{'GIT_AUTHOR_DATE'}'\n" 
>         if $debug >= 3; 
>     $ENV{'GIT_COMMITTER_NAME'} = $committer_name; 
>     print STDERR "GIT_COMMITTER_NAME = '$ENV{'GIT_COMMITTER_NAME'}'\n" 
>         if $debug >= 3; 
>     $ENV{'GIT_COMMITTER_EMAIL'} = $committer_email; 
>     print STDERR "GIT_COMMITTER_EMAIL = '$ENV{'GIT_COMMITTER_EMAIL'}'\n" 
>         if $debug >= 3; 
>     $ENV{'GIT_COMMITTER_DATE'} = $committer_date; 
>     print STDERR "GIT_COMMITTER_DATE = '$ENV{'GIT_COMMITTER_DATE'}'\n" 
>         if $debug >= 3; 
>     my(@command) = ("git", "commit-tree", $tree); 
>     push(@command, "-p", $parent) 
>         unless $parent eq '0000000000000000000000000000000000000000'; 
>     print STDERR "\@command = '", join(' ', @command), "'\n" if $debug >= 
> 3; 
>
>     # Set up to handle both stdin and stdout of the command. 
>     pipe(PARENT_RDR, CHILD_WTR); 
>     pipe(CHILD_RDR, PARENT_WTR); 
>
>     my($new_commit); 
>     my($pid) = fork(); 
>     print STDERR "\$pid = '$pid'\n" if $debug >= 3; 
>     if (!defined($pid)) { 
>         # Error forking. 
>         die "Cannot fork to create subprocess: $!"; 
>     } elsif ($pid) { 
>         print STDERR "Parent.\n" if $debug >= 3; 
>         # This is the parent process. 
>         close CHILD_RDR; 
>         close CHILD_WTR; 
>         # Write the commit message to the subprocess. 
>         print PARENT_WTR join('', @commit_message); 
>         close PARENT_WTR; 
>         # Read the new commit hash from the subprocess. 
>         chomp($new_commit = <PARENT_RDR>); 
>         close PARENT_RDR; 
>         # Wait for the subprocess to terminate. 
>         waitpid($pid, 0); 
>         # Check its exit status. 
>         my($r) = $? >> 8; 
>         print STDERR "\$r = $r\n" if $debug >= 3; 
>         die "Error executing '" . join(' ', @command) . "': exit code 
> $r\n" 
>             if $r; 
>     } else { 
>         print STDERR "Child.\n" if $debug >= 3; 
>         close PARENT_RDR; 
>         close PARENT_WTR; 
>         open(STDIN, "<&CHILD_RDR"); 
>         open(STDOUT, ">&CHILD_WTR"); 
>         exec(@command) || die "Error exec('", join(' ', @command), "'): 
> $!\n" 
>     } 
>     print STDERR "\$new_commit = '$new_commit'\n" if $debug >= 3; 
>
>     # Return the new commit hash. 
>     return $new_commit; 
> } 
>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to