On Tue, Feb 18, 2020 at 05:27:52AM -0800, Steve Cobrin wrote:

> Historically we were really old-school and stored different versions of 
> files with a timestamp appended to the end of the filename, and stashed 
> them into a directory, e.g.
> 
> foo
> .archive/YYYYmmdd
> 
> Now we want to put all the instances of the file int a Git repo, is there 
> any easy way to do it?

Depends on what you define as "easy".

Git has a whole lot of low-level commands to synthesize commits, so I'd
say in your case you could write a program in any suitable programming
language which would call out to Git.

The program would enumerate the files under ".archive", sort them by
date parsed from those "YYYYmmdd"-formatted names then would synthesize
a series of commits referring to these files.

Here is a quick stab at undertaking such a task. The script presupposes
it is run in a directory one level higher than ".archive" and it creates
a Git repository named "dest" (it is recreated if exists). The contents
of each version of the file is recorded in commits as "foo.txt".

--------------------------------8<--------------------------------
#!/bin/sh

set -e -u

src=.archive
dest=dest

test -d "$dest" && rm -rf "$dest"
git init "$dest"

GIT_DIR="$dest/.git"
export GIT_DIR

find "$src" -mindepth 1 -maxdepth 1 -type f -printf '%f\n' |
sed -nEe 's!^(.{4})(.{2})(.{2})$!& \1-\2-\3!p' | {
        set -e -u
        while read fn ymd; do
                sec=`date --date="$ymd" +%s`
                printf '%s\t%s\t%s\n' "$sec" "$ymd" "$fn"
        done | sort -k 1 -n
} | {
        set -e -u
        parent=''
        while read _ ymd fn; do
                sha=`git hash-object -w "$src/$fn"`
                git update-index --add --replace --cacheinfo 0644,$sha,foo.txt
                sha=`git write-tree`
                GIT_AUTHOR_DATE="${ymd}T00:00:00"
                export GIT_AUTHOR_DATE
                if [ "x$parent" = "x" ]; then
                        parent=`git commit-tree -m 'xxx' "$sha"`
                else
                        parent=`git commit-tree -m 'xxx' -p "$parent" "$sha"`
                fi
                echo "Committed: $parent"
        done
        git update-ref HEAD "$parent"
}
--------------------------------8<--------------------------------

How the script rolls:

- The contents of the ".archive" direcory is searched for files,
  only one level deep.

- The names of the files are filtered through a sed script which
  extracts the first four, the following two and then another following
  two characters for each and then prints the original string and
  the extracted bits joined by a dash - so that "YYYYmmdd" gets converted
  to "YYYYmmdd YYYY-mm-dd".

- The output of the sed script is fed to a compound shell command
  which converts each received "YYYY-mm-dd" to a number of seconds
  since UNIX epoch corresponding to the source date.

  The result is joined with the original file name and the source date
  and then sorted using numeric sort on the number of seconds.

- The sorted output is piped to another compound shell command
  which injects the named file into the destination Git database
  then updates the index with the SHA1 name of the injected blob
  recording it as a file (hence mode 0644 with the name "foo.txt").

  The index is then written as a new Git tree object (tree object
  represent the contents of what you'd call a directory) and then
  its SHA1 name is used to record a new commit with the message "xxx".

- Once the string of commits is formed, the HEAD reference in the
  destination repository is updated to point to that last commit.


The somewhat subtle points are exporting of the GIT_DIR environment
variable - it allows to make all called Git commands to expect to find
the Git object database in the specified location rather than to apply
their usual heuristics about finding it in the current directory or in
one of its parent directories, - and the GIT_AUTHOR_DATE environment
variable which forces `git commit-tree` to use that date in the
generated commit object, not the current one.

You might also want to set (and export) other GIT_AUTHOR_* variables in
order to set authorship of the commits.

Also note two subtle points about interpretation of dates: it happens
here in two places: `date --date=...` interprets it and then
`git commit-tree` does the same. By default, both of them interpret the
dates as referring to your local timezone (well, the timezone as seen
from the shell running the script). This might be perfectly OK but if
not, you might need to resort to various tricks to tell the both
commands which timezone the date really is to be interpreted.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/git-users/20200218195205.epwwzvr267dwiqq7%40carbon.

Reply via email to