On Tue, Nov 16, 2021 at 6:38 PM Huji Lee <[email protected]> wrote:
>
> I went back and reactivated the line in .bash_profile which enabled zsh 
> ("exec zsh" as the last line of .bash_profile)
>
> Then I submitted the job to the grid, using a command like this:
>
> jsub -N "n"  -once -o ~/err/nightly.out -e ~/err/nightly.err 
> ~/grid/jobs/nightly.sh
>
> I did it three ways. First, I used the nightly.sh file as is (see source). 
> Second, I replaced "source" with "." and third I replaced "source" with 
> "bash". In all three cases, it failed, without even producing an output or 
> error. The nightly.out and nightly.err files were created of course, but were 
> empty.
>
> Next, I added a "#!/bin/bash" shabang and ran it again all three ways. Result 
> was the same.
>
> Running qstat many times shows that the job gets into a queued state ("qw") 
> and after a few seconds, it goes into the run state ("r") and immediately 
> stops.
>
> Removing the "exec zsh" command from .bash_profile will make things work 
> again.
>
> Finally, I decided maybe the problem is that zsh is available for me, but not 
> on the grid. So I change the .bash_profile ending from a single "exec zsh" 
> command to this:
>
> if [ -f /usr/bin/zsh ]; then
>     zsh
> fi
>
> Under this config, jobs on the grid worked, and when I used "become" to login 
> as my tool, I ended with zsh. Obviously, I am happy with this workaround. But 
> I am still curious as to the root cause.
>
> Is it really that zsh is not available on the grid, and the grid tries to 
> replicate my environment first and reaches the "exec zsh" command and falls 
> apart somehow?
>

This is consistent with what I described earlier:

> Since you have "exec zsh" in your
> .bash_profile, bash will run it as startup as a login shell, which in
> theory would immediately replace itself with zsh with no arguments.
> zsh will then see it has no arguments, attempts to read script from
> stdin and get nothing, and immediately exit, stopping the job in grid.

However, now that you have "zsh" instead of "exec zsh", the "replace"
is not done. bash as the login shell executes zsh as a subshell, and
zsh, having no inputs, immediately exits. The execution continues as
if nothing had ever happened.

I just tested the behavior of a how bash invokes .bash_profile by
adding a sleep 60 to .bash_profile, and have my test.sh have a
shebang, a a job is submitted for both with explicit 'bash' and
without, and it looks like .bash_profile is executed in bath cases:

  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  sgeadmin   762  0.4  0.1 111020 16056 ?        Sl   Mar25 1383:08
/usr/lib/gridengine/sge_execd
  [...]
  sgeadmin 20388  0.0  0.1  51468  8540 ?        S    07:57   0:00  \_
/usr/lib/gridengine/sge_shepherd -bg
  tools.z+ 20390  0.0  0.0  23580  3196 ?        Ss   07:57   0:00
 \_ -bash -c /data/project/zhuyifei1999-test/test.sh
  tools.z+ 20393  0.0  0.0   5796   672 ?        S    07:57   0:00
     \_ sleep 60

  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  sgeadmin   752  0.3  0.1 115112 16100 ?        Sl   Mar25 1313:16
/usr/lib/gridengine/sge_execd
  [...]
  sgeadmin  8715  0.0  0.1  51468  8688 ?        S    07:57   0:00  \_
/usr/lib/gridengine/sge_shepherd -bg
  tools.z+  8717  0.0  0.0  23580  3324 ?        Ss   07:57   0:00
 \_ -bash -c /bin/bash /data/project/zhuyifei1999-test/test.sh
  tools.z+  8720  0.0  0.0   5796   656 ?        S    07:57   0:00
     \_ sleep 60

It did take me by surprise that it's still bash that invokes the given
command, because bash was not in the process tree for a usual "jsub
[...] python script.sh". For example, a non-continuous job typically
looks like this:

  sgeadmin 28386  0.0  0.1  51468  8588 ?        S    Nov15   0:00  \_
/usr/lib/gridengine/sge_shepherd -bg
  tools.f+ 28388  7.2  3.5 427144 293024 ?       Ss   Nov15 210:55  |
 \_ /usr/bin/python pycore/pwb.py pycore/fawikibot/rade.py -newcat:10

And a continuous one:

  sgeadmin  3699  0.0  0.0  51464  4540 ?        S    Apr19   0:00  \_
/usr/lib/gridengine/sge_shepherd -bg
  tools.b+  3701  0.0  0.0   4280    68 ?        SNs  Apr19   0:00  |
 \_ /bin/sh /var/spool/gridengine/execd/tools-sgeexec-0942/job_scripts/1302451
  tools.b+  3702  0.2  2.8 505104 231092 ?       SNl  Apr19 674:45  |
     \_ /usr/bin/python bot2.py

There is no `-bash -c "python script.sh"`

However, if you trace what's going on, for a non-interactive bash that
only receives a single command, it will directly execve that command:

  $ strace -e clone,execve bash -c '/bin/true'
  execve("/bin/bash", ["bash", "-c", "/bin/true"], [/* 26 vars */]) = 0
  execve("/bin/true", ["/bin/true"], [/* 25 vars */]) = 0
  +++ exited with 0 +++

It does not involve child processes from the fork-exec model you'd
expect. Therefore, we can say that no matter what you do with the job
submission, a bash non-interactive login shell will be executed to run
the command you specified to jsub. And the mess of "bash replace
itself with zsh which immediately exits because stdin is empty" will
apply.

I think it is important to clarify that a shell like bash has 4 modes
of execution, defined by whether it is an interactive shell, and
whether it is a login shell. The details for the modes in the case of
bash you can find in its man page [1]. But tl;dr:

Login shells:
- Upon startup, sources /etc/profile, then the first one among
~/.bash_profile, ~/.bash_login, and ~/.profile, that exists.
- `bash -l` and `-bash` (note the dash sign at the front) makes bash a
login shell

Non-login shells:
- If also interactive, upon startup, sources ~/.bashrc

Interactive shells:
- DIsplays a prompt for each command

Non-interactive shells:
- Upon startup, sources $BASH_ENV if it exists
- As we saw above, if the command is given in the command string in -c
and there is only one command, bash does not fork-exec the command but
execs the command directly.

So you might wonder why the separation of login shells (profile) vs
non-login shells (rc). The reason is some environments are inherited
by subshells while others are not. Environment variables are
inherited:

  $ export FOO=bar
  $ echo $FOO
  bar
  $ bash
  $ echo $FOO
  bar

While things like aliases are not:

  $ alias foo='echo bar'
  $ foo
  bar
  $ bash
  $ foo
  bash: foo: command not found

There are environment setups that get inherited but you do not want it
to be executed over and over by subshells. For example, appending to
$PATH (`export PATH="$PATH:/path/to/bin"`). If it is in rc instead of
profile, every time you run an interactive bash subshell PATH gets
longer and more redundant; hence $PATH setups normally go to profile
instead of rc. Non-inheritable setups like aliases go to rc. And the
separation between .bash_profile and .profile is just so that you can
have a .bash_profile that uses bash-specific syntax. I never needed
any so I always use .profile.

And to have bash login shells also get the initialization from rc,
.profile usually has a header like this:

  # if running bash
  if [ -n "$BASH_VERSION" ]; then
      # include .bashrc if it exists
      if [ -f "$HOME/.bashrc" ]; then
          . "$HOME/.bashrc"
      fi
  fi

And .bashrc:

  # Test for an interactive shell
  if [[ $- != *i* ]] ; then
          # Shell is non-interactive.  Be done now!
          return
  fi

I hope this makes sense. Let me know if not.

Back to your question, let's see in what scenarios you would want to invoke zsh:
- Non-interactive shells: No, you don't want `bash command.sh` randomly exec zsh
- Interactive non-login shells: No, if you explicitly run `bash`, you
want bash not zsh.
- Interactive login shells. Yes, this is what `become tool` runs
initially and you want bash here.

Hence, to run in a login shell environment you'd want the .profile or
.bash_profile. And interactive guard is simply [[ $- = *i* ]] in bash
syntax, so what you want, expressed in code, is in .bash_profile:

  if [[ $- = *i* ]]; then
          exec zsh
  fi

As a side note, yes zsh exists on the grid hosts:

  zhuyifei1999@tools-sgeexec-0901: ~$ ls -l {/usr,}/bin/zsh
  -rwxr-xr-x 1 root root 819744 Dec  1  2020 /bin/zsh
  lrwxrwxrwx 1 root root      8 Nov 22  2018 /usr/bin/zsh -> /bin/zsh

[1] https://man7.org/linux/man-pages/man1/bash.1.html#INVOCATION

YiFei Zhu
_______________________________________________
Cloud mailing list -- [email protected]
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to