Nguyễn Thái Ngọc Duy  <pclo...@gmail.com> writes:

> Commit 57ea712 (git.c: make sure we do not leak GIT_* to alias scripts -
> 2015-12-20) does not realize that handle_alias() can be called multiple
> times because of the forever loop in run_argv(). The commit breaks alias
> chains.
>
> Suppose you have an alias "abc" that resolves to another alias "def",
> which finally resolve to "git status". handle_alias() is called twice.
> save_env() and restore_env() are also called twice. But because of the
> check save_env_before_alias in save_env(), we save once while trying to
> restore twice. Consequences are left for the reader's imagination.
>
> Fortunately, you cannot make an alias of another alias. At least not
> yet. Unfortunately it can still happen with help.autocorrect, where your
> alias typo is treated as the first "alias", and it can be resolved to
> the second alias. Then boom.
>
> Make sure we call save_env() and restore_env() in pairs. While at there,
> set orig_cwd to NULL after freeing it for hygiene.
>
> Reported-by: Michael J Gruber <g...@drmicha.warpmail.net>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
> ---

I spoke too soon, I am afraid.

The log message talks about "we save once while trying to restore
twice" being bad, and the fix is to make sure we save and restore in
pairs.  Before or after the patch, however, there is no change in
the callers of save_env_before_alias() and restore_env(), and it is
unclear how that is ensured.

It turns out that the callers are doing the right thing (assuming
that calling save and restore in pairs is the right solution), and
the culprit is in the save_env_before_alias() function that returns
without saving when we were called.  That is mentioned in the log,
but the description of last paragraph leaves the impression that it
was callers that needed fixing, which is misleading.

After the patch, we have "saved" and "saved_env_before_alias".

 - The former is a protection against committing similar programming
   errors in the future (i.e. glorified assert()).  Can we have it
   in a separate commit, perhaps on top of the real fix, to make the
   change for the real fix easier to understand?

 - Do we still need the latter?  saved_env_before_alias is set to 1
   if we called save_env_before_alias() even once, which happens
   always (and only) at the beginning of handle_alias(), so that is
   equivalent to saying "have we ever called handle_alias()?"

   And there is only one caller of handle_alias, in run_argv(), and
   it maintains yet another variable "done_alias" to keep track of
   the same information.

   The only code that cares about saved_env_before_alias is
   handle_builtin(), but it is a glorified no-op if
   saved_env_before_alias() is set.  And the only caller of this
   handle_builtin(), for which saved_env_before_alias matters, is
   the same run_argv().

I wonder if this would be better done as a multi-part series that
goes like this:

 [1/3] git: remove an early return from save_env_before_alias()

       whose description would be the first two paragraphs of your
       patch, with two line removal, i.e.

        diff --git a/git.c b/git.c
        index 98d4412..a57a4cb 100644
        --- a/git.c
        +++ b/git.c
        @@ -30,8 +30,6 @@ static int saved_env_before_alias;
         static void save_env_before_alias(void)
         {
                int i;
        -       if (saved_env_before_alias)
        -               return;
                saved_env_before_alias = 1;
                orig_cwd = xgetcwd();
                for (i = 0; i < ARRAY_SIZE(env_names); i++) {

       that should be a sufficient "fix" to the issue.

 [2/3] git: protect against unbalanced calls to {save,restore}_env()

       We made sure that save_env_before_alias() does not skip
       saving the environment when asked to (which led to double
       free of orig_cwd in restore_env()) with the previous step.
       Protect against future breakage where somebody adds new
       callers of these functions in an unbalanced fashion.
       
        diff --git a/git.c b/git.c
        index a57a4cb..e39b972 100644
        --- a/git.c
        +++ b/git.c
        @@ -26,11 +26,15 @@ static const char *env_names[] = {
         };
         static char *orig_env[4];
         static int saved_env_before_alias;
        +static int save_restore_env_balance;

         static void save_env_before_alias(void)
         {
                int i;
                saved_env_before_alias = 1;
        +
        +       assert(save_restore_env_balance == 0);
        +       save_restore_env_balance = 1;
                orig_cwd = xgetcwd();
                for (i = 0; i < ARRAY_SIZE(env_names); i++) {
                        orig_env[i] = getenv(env_names[i]);
        @@ -42,6 +46,9 @@ static void save_env_before_alias(void)
         static void restore_env(int external_alias)
         {
                int i;
        +
        +       assert(save_restore_env_balance == 1);
        +       save_restore_env_balance = 0;
                if (!external_alias && orig_cwd && chdir(orig_cwd))
                        die_errno("could not move to %s", orig_cwd);
                free(orig_cwd);

 [3/3] git: simplify environment save/restore logic

       The only code that cares about the value of the global
       variable saved_env_before_alias after the previous fix is
       handle_builtin() that turns into a glorified no-op when the
       variable is true, so the logic could safely be lifted to its
       caller, i.e. the caller can refrain from calling it when the
       variable is set.

       This variable tells us if save_env_before_alias() was called
       (with or without matching restore_env()), but the sole caller
       of the function, handle_alias(), always calls it the first as
       thing, so it essentially keeps track of the fact that
       handle_alias() has ever been called.

       It turns out that handle_builtin() and handle_alias() are
       called only from one function in a way that the value of the
       variable matters, which is run_argv(), and it already keeps
       track of the fact that it called handle_alias().

       So we can simplify the whole thing by:

       - Change handle_builtin() to always make a direct call to the
         builtin implementation it finds, and make sure the caller
         refrains from calling it if handle_alias() has ever been
         called;

       - Remove saved_env_before_alias variable, and instead use the
         local "done_alias" variable maintained inside run_argv() to
         make the above decision.

 [4/3] git: plug leaks of saved environments

       I think orig_env[] members if save_env() is called twice are
       leaked, so they would need to be free'd just like orig_cwd is
       freed in restore_env().  Optionally, you can assign NULL to
       them and also to orig_cwd after freeing.



The patch text for [3/3] would look like this.

 git.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/git.c b/git.c
index e39b972..c8d7b56 100644
--- a/git.c
+++ b/git.c
@@ -25,13 +25,11 @@ static const char *env_names[] = {
        GIT_PREFIX_ENVIRONMENT
 };
 static char *orig_env[4];
-static int saved_env_before_alias;
 static int save_restore_env_balance;
 
 static void save_env_before_alias(void)
 {
        int i;
-       saved_env_before_alias = 1;
 
        assert(save_restore_env_balance == 0);
        save_restore_env_balance = 1;
@@ -533,16 +531,8 @@ static void handle_builtin(int argc, const char **argv)
        }
 
        builtin = get_builtin(cmd);
-       if (builtin) {
-               /*
-                * XXX: if we can figure out cases where it is _safe_
-                * to do, we can avoid spawning a new process when
-                * saved_env_before_alias is true
-                * (i.e. setup_git_dir* has been run once)
-                */
-               if (!saved_env_before_alias)
-                       exit(run_builtin(builtin, argc, argv));
-       }
+       if (builtin)
+               exit(run_builtin(builtin, argc, argv));
 }
 
 static void execv_dashed_external(const char **argv)
@@ -586,8 +576,17 @@ static int run_argv(int *argcp, const char ***argv)
        int done_alias = 0;
 
        while (1) {
-               /* See if it's a builtin */
-               handle_builtin(*argcp, *argv);
+               /*
+                * If we tried alias and futzed with our environment,
+                * it no longer is safe to invoke builtins directly in
+                * general.  We have to spawn them as dashed externals.
+                *
+                * NEEDSWORK: if we can figure out cases
+                * where it is safe to do, we can avoid spawning a new
+                * process.
+                */
+               if (!done_alias)
+                       handle_builtin(*argcp, *argv);
 
                /* .. then try the external ones */
                execv_dashed_external(*argv);
@@ -598,9 +597,9 @@ static int run_argv(int *argcp, const char ***argv)
                 */
                if (done_alias)
                        break;
+               done_alias = 1;
                if (!handle_alias(argcp, argv))
                        break;
-               done_alias = 1;
        }
 
        return done_alias;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to