On Tue, Feb 18, 2020 at 05:27:52AM -0800, Steve Cobrin wrote: > Historically we were really old-school and stored different versions of > files with a timestamp appended to the end of the filename, and stashed > them into a directory, e.g. > > foo > .archive/YYYYmmdd > > Now we want to put all the instances of the file int a Git repo, is there > any easy way to do it?
Depends on what you define as "easy". Git has a whole lot of low-level commands to synthesize commits, so I'd say in your case you could write a program in any suitable programming language which would call out to Git. The program would enumerate the files under ".archive", sort them by date parsed from those "YYYYmmdd"-formatted names then would synthesize a series of commits referring to these files. Here is a quick stab at undertaking such a task. The script presupposes it is run in a directory one level higher than ".archive" and it creates a Git repository named "dest" (it is recreated if exists). The contents of each version of the file is recorded in commits as "foo.txt". --------------------------------8<-------------------------------- #!/bin/sh set -e -u src=.archive dest=dest test -d "$dest" && rm -rf "$dest" git init "$dest" GIT_DIR="$dest/.git" export GIT_DIR find "$src" -mindepth 1 -maxdepth 1 -type f -printf '%f\n' | sed -nEe 's!^(.{4})(.{2})(.{2})$!& \1-\2-\3!p' | { set -e -u while read fn ymd; do sec=`date --date="$ymd" +%s` printf '%s\t%s\t%s\n' "$sec" "$ymd" "$fn" done | sort -k 1 -n } | { set -e -u parent='' while read _ ymd fn; do sha=`git hash-object -w "$src/$fn"` git update-index --add --replace --cacheinfo 0644,$sha,foo.txt sha=`git write-tree` GIT_AUTHOR_DATE="${ymd}T00:00:00" export GIT_AUTHOR_DATE if [ "x$parent" = "x" ]; then parent=`git commit-tree -m 'xxx' "$sha"` else parent=`git commit-tree -m 'xxx' -p "$parent" "$sha"` fi echo "Committed: $parent" done git update-ref HEAD "$parent" } --------------------------------8<-------------------------------- How the script rolls: - The contents of the ".archive" direcory is searched for files, only one level deep. - The names of the files are filtered through a sed script which extracts the first four, the following two and then another following two characters for each and then prints the original string and the extracted bits joined by a dash - so that "YYYYmmdd" gets converted to "YYYYmmdd YYYY-mm-dd". - The output of the sed script is fed to a compound shell command which converts each received "YYYY-mm-dd" to a number of seconds since UNIX epoch corresponding to the source date. The result is joined with the original file name and the source date and then sorted using numeric sort on the number of seconds. - The sorted output is piped to another compound shell command which injects the named file into the destination Git database then updates the index with the SHA1 name of the injected blob recording it as a file (hence mode 0644 with the name "foo.txt"). The index is then written as a new Git tree object (tree object represent the contents of what you'd call a directory) and then its SHA1 name is used to record a new commit with the message "xxx". - Once the string of commits is formed, the HEAD reference in the destination repository is updated to point to that last commit. The somewhat subtle points are exporting of the GIT_DIR environment variable - it allows to make all called Git commands to expect to find the Git object database in the specified location rather than to apply their usual heuristics about finding it in the current directory or in one of its parent directories, - and the GIT_AUTHOR_DATE environment variable which forces `git commit-tree` to use that date in the generated commit object, not the current one. You might also want to set (and export) other GIT_AUTHOR_* variables in order to set authorship of the commits. Also note two subtle points about interpretation of dates: it happens here in two places: `date --date=...` interprets it and then `git commit-tree` does the same. By default, both of them interpret the dates as referring to your local timezone (well, the timezone as seen from the shell running the script). This might be perfectly OK but if not, you might need to resort to various tricks to tell the both commands which timezone the date really is to be interpreted. -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/git-users/20200218195205.epwwzvr267dwiqq7%40carbon.