On Thu, Dec 2, 2021 at 7:14 PM Huji Lee <[email protected]> wrote: > I had to read through your email a few times to fully understand it. You > provided lots of useful information; thank you!
Yep, things like terminals and shells are more complicated than they look ;) > I tried changing the code in my .bash_profile to what you suggested; after > logging out and logging back in, zsh was my shell in interactive mode. I then > submitted a job via jsub and that also seemed to work correctly. In short, it > seems like what you suggested takes care of my problem. I will let you know > if I find any evidence otherwise. Sounds good to me YiFei Zhu > On Tue, Nov 23, 2021 at 12:06 PM YiFei Zhu <[email protected]> wrote: >> >> On Wed, Nov 17, 2021 at 1:04 AM YiFei Zhu <[email protected]> wrote: >> > On Tue, Nov 16, 2021 at 6:38 PM Huji Lee <[email protected]> wrote: >> > > >> > > I went back and reactivated the line in .bash_profile which enabled zsh >> > > ("exec zsh" as the last line of .bash_profile) >> > > >> > > Then I submitted the job to the grid, using a command like this: >> > > >> > > jsub -N "n" -once -o ~/err/nightly.out -e ~/err/nightly.err >> > > ~/grid/jobs/nightly.sh >> > > >> > > I did it three ways. First, I used the nightly.sh file as is (see >> > > source). Second, I replaced "source" with "." and third I replaced >> > > "source" with "bash". In all three cases, it failed, without even >> > > producing an output or error. The nightly.out and nightly.err files were >> > > created of course, but were empty. >> > > >> > > Next, I added a "#!/bin/bash" shabang and ran it again all three ways. >> > > Result was the same. >> > > >> > > Running qstat many times shows that the job gets into a queued state >> > > ("qw") and after a few seconds, it goes into the run state ("r") and >> > > immediately stops. >> > > >> > > Removing the "exec zsh" command from .bash_profile will make things work >> > > again. >> > > >> > > Finally, I decided maybe the problem is that zsh is available for me, >> > > but not on the grid. So I change the .bash_profile ending from a single >> > > "exec zsh" command to this: >> > > >> > > if [ -f /usr/bin/zsh ]; then >> > > zsh >> > > fi >> > > >> > > Under this config, jobs on the grid worked, and when I used "become" to >> > > login as my tool, I ended with zsh. Obviously, I am happy with this >> > > workaround. But I am still curious as to the root cause. >> > > >> > > Is it really that zsh is not available on the grid, and the grid tries >> > > to replicate my environment first and reaches the "exec zsh" command and >> > > falls apart somehow? >> > > >> > >> > This is consistent with what I described earlier: >> > >> > > Since you have "exec zsh" in your >> > > .bash_profile, bash will run it as startup as a login shell, which in >> > > theory would immediately replace itself with zsh with no arguments. >> > > zsh will then see it has no arguments, attempts to read script from >> > > stdin and get nothing, and immediately exit, stopping the job in grid. >> > >> > However, now that you have "zsh" instead of "exec zsh", the "replace" >> > is not done. bash as the login shell executes zsh as a subshell, and >> > zsh, having no inputs, immediately exits. The execution continues as >> > if nothing had ever happened. >> > >> > I just tested the behavior of a how bash invokes .bash_profile by >> > adding a sleep 60 to .bash_profile, and have my test.sh have a >> > shebang, a a job is submitted for both with explicit 'bash' and >> > without, and it looks like .bash_profile is executed in bath cases: >> > >> > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >> > sgeadmin 762 0.4 0.1 111020 16056 ? Sl Mar25 1383:08 >> > /usr/lib/gridengine/sge_execd >> > [...] >> > sgeadmin 20388 0.0 0.1 51468 8540 ? S 07:57 0:00 \_ >> > /usr/lib/gridengine/sge_shepherd -bg >> > tools.z+ 20390 0.0 0.0 23580 3196 ? Ss 07:57 0:00 >> > \_ -bash -c /data/project/zhuyifei1999-test/test.sh >> > tools.z+ 20393 0.0 0.0 5796 672 ? S 07:57 0:00 >> > \_ sleep 60 >> > >> > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >> > sgeadmin 752 0.3 0.1 115112 16100 ? Sl Mar25 1313:16 >> > /usr/lib/gridengine/sge_execd >> > [...] >> > sgeadmin 8715 0.0 0.1 51468 8688 ? S 07:57 0:00 \_ >> > /usr/lib/gridengine/sge_shepherd -bg >> > tools.z+ 8717 0.0 0.0 23580 3324 ? Ss 07:57 0:00 >> > \_ -bash -c /bin/bash /data/project/zhuyifei1999-test/test.sh >> > tools.z+ 8720 0.0 0.0 5796 656 ? S 07:57 0:00 >> > \_ sleep 60 >> > >> > It did take me by surprise that it's still bash that invokes the given >> > command, because bash was not in the process tree for a usual "jsub >> > [...] python script.sh". For example, a non-continuous job typically >> > looks like this: >> > >> > sgeadmin 28386 0.0 0.1 51468 8588 ? S Nov15 0:00 \_ >> > /usr/lib/gridengine/sge_shepherd -bg >> > tools.f+ 28388 7.2 3.5 427144 293024 ? Ss Nov15 210:55 | >> > \_ /usr/bin/python pycore/pwb.py pycore/fawikibot/rade.py -newcat:10 >> > >> > And a continuous one: >> > >> > sgeadmin 3699 0.0 0.0 51464 4540 ? S Apr19 0:00 \_ >> > /usr/lib/gridengine/sge_shepherd -bg >> > tools.b+ 3701 0.0 0.0 4280 68 ? SNs Apr19 0:00 | >> > \_ /bin/sh >> > /var/spool/gridengine/execd/tools-sgeexec-0942/job_scripts/1302451 >> > tools.b+ 3702 0.2 2.8 505104 231092 ? SNl Apr19 674:45 | >> > \_ /usr/bin/python bot2.py >> > >> > There is no `-bash -c "python script.sh"` >> > >> > However, if you trace what's going on, for a non-interactive bash that >> > only receives a single command, it will directly execve that command: >> > >> > $ strace -e clone,execve bash -c '/bin/true' >> > execve("/bin/bash", ["bash", "-c", "/bin/true"], [/* 26 vars */]) = 0 >> > execve("/bin/true", ["/bin/true"], [/* 25 vars */]) = 0 >> > +++ exited with 0 +++ >> > >> > It does not involve child processes from the fork-exec model you'd >> > expect. Therefore, we can say that no matter what you do with the job >> > submission, a bash non-interactive login shell will be executed to run >> > the command you specified to jsub. And the mess of "bash replace >> > itself with zsh which immediately exits because stdin is empty" will >> > apply. >> > >> > I think it is important to clarify that a shell like bash has 4 modes >> > of execution, defined by whether it is an interactive shell, and >> > whether it is a login shell. The details for the modes in the case of >> > bash you can find in its man page [1]. But tl;dr: >> > >> > Login shells: >> > - Upon startup, sources /etc/profile, then the first one among >> > ~/.bash_profile, ~/.bash_login, and ~/.profile, that exists. >> > - `bash -l` and `-bash` (note the dash sign at the front) makes bash a >> > login shell >> > >> > Non-login shells: >> > - If also interactive, upon startup, sources ~/.bashrc >> > >> > Interactive shells: >> > - DIsplays a prompt for each command >> > >> > Non-interactive shells: >> > - Upon startup, sources $BASH_ENV if it exists >> > - As we saw above, if the command is given in the command string in -c >> > and there is only one command, bash does not fork-exec the command but >> > execs the command directly. >> > >> > So you might wonder why the separation of login shells (profile) vs >> > non-login shells (rc). The reason is some environments are inherited >> > by subshells while others are not. Environment variables are >> > inherited: >> > >> > $ export FOO=bar >> > $ echo $FOO >> > bar >> > $ bash >> > $ echo $FOO >> > bar >> > >> > While things like aliases are not: >> > >> > $ alias foo='echo bar' >> > $ foo >> > bar >> > $ bash >> > $ foo >> > bash: foo: command not found >> > >> > There are environment setups that get inherited but you do not want it >> > to be executed over and over by subshells. For example, appending to >> > $PATH (`export PATH="$PATH:/path/to/bin"`). If it is in rc instead of >> > profile, every time you run an interactive bash subshell PATH gets >> > longer and more redundant; hence $PATH setups normally go to profile >> > instead of rc. Non-inheritable setups like aliases go to rc. And the >> > separation between .bash_profile and .profile is just so that you can >> > have a .bash_profile that uses bash-specific syntax. I never needed >> > any so I always use .profile. >> > >> > And to have bash login shells also get the initialization from rc, >> > .profile usually has a header like this: >> > >> > # if running bash >> > if [ -n "$BASH_VERSION" ]; then >> > # include .bashrc if it exists >> > if [ -f "$HOME/.bashrc" ]; then >> > . "$HOME/.bashrc" >> > fi >> > fi >> > >> > And .bashrc: >> > >> > # Test for an interactive shell >> > if [[ $- != *i* ]] ; then >> > # Shell is non-interactive. Be done now! >> > return >> > fi >> > >> > I hope this makes sense. Let me know if not. >> > >> > Back to your question, let's see in what scenarios you would want to >> > invoke zsh: >> > - Non-interactive shells: No, you don't want `bash command.sh` randomly >> > exec zsh >> > - Interactive non-login shells: No, if you explicitly run `bash`, you >> > want bash not zsh. >> > - Interactive login shells. Yes, this is what `become tool` runs >> > initially and you want bash here. >> > >> > Hence, to run in a login shell environment you'd want the .profile or >> > .bash_profile. And interactive guard is simply [[ $- = *i* ]] in bash >> > syntax, so what you want, expressed in code, is in .bash_profile: >> > >> > if [[ $- = *i* ]]; then >> > exec zsh >> > fi >> > >> > As a side note, yes zsh exists on the grid hosts: >> > >> > zhuyifei1999@tools-sgeexec-0901: ~$ ls -l {/usr,}/bin/zsh >> > -rwxr-xr-x 1 root root 819744 Dec 1 2020 /bin/zsh >> > lrwxrwxrwx 1 root root 8 Nov 22 2018 /usr/bin/zsh -> /bin/zsh >> > >> > [1] https://man7.org/linux/man-pages/man1/bash.1.html#INVOCATION >> > >> > YiFei Zhu >> >> Have you had a chance to take a look at it yet? >> >> YiFei Zhu >> _______________________________________________ >> Cloud mailing list -- [email protected] >> List information: >> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/ > > _______________________________________________ > Cloud mailing list -- [email protected] > List information: > https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/ _______________________________________________ Cloud mailing list -- [email protected] List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
