Ian Campbell <[email protected]> writes:
> Allowing for incremental updates of large trees.
"by doing what" is missing. And ...
>
> I have been using this as part of the device tree extraction from the Linux
> kernel source since 2013, about time I sent the patch upstream!
... this does not help understanding what is going on. It belongs
to the space after three dashes.
Perhaps
Subject: filter-branch: stash away ref map in a branch
With "--state-branch=<branchname>" option, the mapping from
old object names and filtered ones in ./map/ directory is
stashed away in the object database, and the one from the
previous run is read to populate the ./map/ directory,
allowing for incremental updates of large trees.
or something?
>
> Signed-off-by: Ian Campbell <[email protected]>
> ---
> git-filter-branch.sh | 39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 3a74602ef..d07db3fee 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -86,7 +86,7 @@ USAGE="[--setup <command>] [--env-filter <command>]
> [--parent-filter <command>] [--msg-filter <command>]
> [--commit-filter <command>] [--tag-name-filter <command>]
> [--subdirectory-filter <directory>] [--original <namespace>]
> - [-d <directory>] [-f | --force]
> + [-d <directory>] [-f | --force] [--state-branch <branch>]
> [--] [<rev-list options>...]"
>
> OPTIONS_SPEC=
> @@ -106,6 +106,7 @@ filter_msg=cat
> filter_commit=
> filter_tag_name=
> filter_subdir=
> +state_branch=
> orig_namespace=refs/original/
> force=
> prune_empty=
> @@ -181,6 +182,9 @@ do
> --original)
> orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
> ;;
> + --state-branch)
> + state_branch="$OPTARG"
> + ;;
> *)
> usage
> ;;
> @@ -252,6 +256,20 @@ export GIT_INDEX_FILE
> # map old->new commit ids for rewriting parents
> mkdir ../map || die "Could not create map/ directory"
>
> +if [ -n "$state_branch" ] ; then
> + state_commit=`git show-ref -s "$state_branch"`
I hate to nitpick styles, especially on this script that already has
existing violations, but for completeness:
Style: we prefer to write $(command substitution) instead.
Style: we prefer to write "if test", not "if [".
Style: we prefer to avoid ';' and write "if test condtion" and
"then" on different lines.
It is a bit curious use of "show-ref". It is not wrong per-se, but
"git rev-parse" may be more common. I do not care too deeply either
way, though.
Don't we want to make sure the value given to --state-branch is a
full refname, not just a branch name? What happens when you say
filter-branch --state-branch master
by mistake? "show-ref -s" is likely to show your refs/heads/master,
and other master branches that appear as remote-tracking branches for
the remotes you interact with.
> + if [ -n "$state_commit" ] ; then
> + echo "Populating map from $state_branch ($state_commit)" 1>&2
> + git show "$state_commit":filter.map |
> + perl -n -e 'm/(.*):(.*)/ or die;
> + open F, ">../map/$1" or die;
> + print F "$2" or die;
> + close(F) or die'
The process calling this perl script, which carefully diagnoses
malformed input and dies, does not seem to do anything when it sees
errors. Intended?
> + else
> + echo "Branch $state_branch does not exist. Will create" 1>&2
> + fi
> +fi
> +
> # we need "--" only if there are no path arguments in $@
> nonrevs=$(git rev-parse --no-revs "$@") || exit
> if test -z "$nonrevs"
> @@ -544,6 +562,25 @@ if [ "$filter_tag_name" ]; then
> done
> fi
>
> +if [ -n "$state_branch" ] ; then
> + echo "Saving rewrite state to $state_branch" 1>&2
> + STATE_BLOB=$(ls ../map |
> + perl -n -e 'chomp();
> + open F, "<../map/$_" or die;
> + chomp($f = <F>); print "$_:$f\n";' |
I see it somewhat gross to pipe the output of "/bin/ls" to a Perl
script, instead of iterating over "while (<../map/*>)" inside the
script itself.
> + git hash-object -w --stdin )
> + STATE_TREE=$(/bin/echo -e "100644 blob $STATE_BLOB\tfilter.map" | git
> mktree)
> + STATE_PARENT=$(git show-ref -s "$state_branch")
Don't you already have this in $state_commit?
One advantage of reading $state_branch again at this point is to
detect mistakes of running more than one filter-branch (which may
cause you to read $STATE_PARENT that is different from $state_commit
you read earlier), but I do not think that is being done here, so...
> + unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
> + unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE
Hmph. I can see that you are trying not to be affected by the
committers and authors of the commits on the branch being filtered
(which are set by finish_ident shell function), but I wonder if we
could (and more importantly "want to") do better to preserve the
real committer the user who runs the script may have in the
environment before running it. I guess it does not matter that
much, as long as the user has properly user.{name,email} configured
elsewhere without relying on the environment variable.
> + if [ -n "$STATE_PARENT" ] ; then
> + STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" -p
> "$STATE_PARENT")
> + else
> + STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" )
> + fi
> + git update-ref "$state_branch" "$STATE_COMMIT"
> +fi
> +
> cd "$orig_dir"
> rm -rf "$tempdir"
Despite all the above comments, I like what you are trying to
achieve here. Thanks for sharing.