Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -fdebug-prefix-map=/tmp/bash/bash-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-parentheses -Wno-format-security uname output: Linux $nodename 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu
Bash Version: 5.0 Patch Level: 3 Release Status: release Description: I discovered a curious performance degradation in the combined usage of the constructs "eval set -- ..." and new-style command substitution. In short, setting the positional arguments via eval and then iterating over each one while performing $() command substitution(s) is significantly slower than not using eval, or not making command substitution, or using `` instead. I include below a reduced test script that illustrates the issue. A few notes: - The pathological case is "1 1 0". - I did not observe performance difference in unoptimised builds (-O0). Example output: case 0 0 X set real 0m0.001s user 0m0.000s sys 0m0.000s for loop real 0m0.008s user 0m0.008s sys 0m0.000s -------------------------- case 0 1 0 set real 0m0.001s user 0m0.000s sys 0m0.000s for loop cmdsubst-currency real 0m0.551s user 0m0.048s sys 0m0.144s -------------------------- case 0 1 1 set real 0m0.001s user 0m0.000s sys 0m0.000s for loop cmdsubst-grave real 0m0.520s user 0m0.048s sys 0m0.116s -------------------------- case 1 0 X eval set real 0m0.002s user 0m0.004s sys 0m0.000s for loop real 0m0.008s user 0m0.008s sys 0m0.000s -------------------------- case 1 1 0 eval set real 0m0.002s user 0m0.000s sys 0m0.000s for loop cmdsubst-currency real 0m0.968s user 0m0.432s sys 0m0.148s -------------------------- case 1 1 1 eval set real 0m0.002s user 0m0.004s sys 0m0.000s for loop cmdsubst-grave real 0m0.529s user 0m0.044s sys 0m0.124s -------------------------- Observations: - The pathological case "1 1 0" spends about 10 times more time doing something in userspace during the loop, relative to the comparable cases "0 1 0", "0 1 1", and "1 1 1". - $() seems generally slightly slower than ``, but becomes pathologically so when preceded with "eval set -- ...". - "eval set -- ..." itself doesn't seem slow at all, but obviously it has side-effects not captured by the "time" measurement tool. I tested an --enable-profiling build but the results are vastly different between test cases so I wasn't able to gain much insight by comparing them. (Also I'm inexperienced with gprof.) For reference, though, these are the top 10 functions in the flat listing for 3 of the test cases: case 0 1 0: % cumulative self self total time seconds seconds calls ms/call ms/call name 18.75 0.09 0.09 4801 0.02 0.02 make_child 13.54 0.16 0.07 1200 0.05 0.05 reap_dead_jobs 13.54 0.22 0.07 sigchld_handler 6.25 0.25 0.03 38586 0.00 0.00 read_token.constprop.9 6.25 0.28 0.03 7259 0.00 0.04 param_expand 4.17 0.30 0.02 9609 0.00 0.01 yyparse 2.08 0.31 0.01 28910 0.00 0.00 assignment 2.08 0.32 0.01 28888 0.00 0.00 set_line_mbstate 2.08 0.33 0.01 14417 0.00 0.00 set_signal_handler 2.08 0.34 0.01 9643 0.00 0.00 zread case 1 1 1: % cumulative self self total time seconds seconds calls us/call us/call name 31.03 0.09 0.09 4801 18.75 18.75 make_child 15.52 0.14 0.05 1200 37.50 37.50 reap_dead_jobs 15.52 0.18 0.05 sigchld_handler 6.90 0.20 0.02 9889 2.02 2.02 hash_search 3.45 0.21 0.01 14464 0.69 0.69 dequote_string 3.45 0.22 0.01 8461 1.18 16.14 param_expand 3.45 0.23 0.01 4808 2.08 3.12 run_sigchld_trap 3.45 0.24 0.01 4801 2.08 23.05 command_substitute 3.45 0.25 0.01 4801 2.08 2.09 wait_for 3.45 0.26 0.01 4800 2.08 2.08 string_extract case 1 1 0 (pathological): % cumulative self self total time seconds seconds calls us/call us/call name 38.89 0.21 0.21 28890 7.27 7.27 set_line_mbstate 7.41 0.25 0.04 9610 4.16 22.19 yyparse 6.48 0.29 0.04 1200 29.17 29.17 reap_dead_jobs 6.48 0.32 0.04 sigchld_handler 5.56 0.35 0.03 4801 6.25 6.25 make_child 3.70 0.37 0.02 10848 1.84 1.84 builtin_address_internal 1.85 0.38 0.01 98462 0.10 0.10 unwind_protect_mem 1.85 0.39 0.01 56852 0.18 0.18 make_word_list 1.85 0.40 0.01 9656 1.04 1.04 unquoted_glob_pattern_p 1.85 0.41 0.01 9622 1.04 3.27 parse_matched_pair Anyway, as you can see I already found a workaround (`` instead of $()) that restores most of the lost performance, so I'm not overly concerned with getting to the bottom of this. I just thought it was curious enough to deserve being reported. Thanks. Repeat-By: -----8<--------------- begin test.sh ---------------------- f() { [ $# -eq 3 ] || return local doeval=$1 local docmdsubst=$2 local docmdsubstgrave=$3 local a=$(seq -s ' ' 1200) if [ "$doeval" -eq 1 ]; then echo -n eval set time eval "set -- $a" else echo -n set time set -- $a fi local i if [ "$docmdsubst" -eq 1 ]; then if [ "$docmdsubstgrave" -eq 1 ]; then echo -n for loop cmdsubst-grave time for i; do printf '%s\n' "`printf %s "$i"`" done >/dev/null else echo -n for loop cmdsubst-currency time for i; do printf '%s\n' "$(printf %s "$i")" done >/dev/null fi else echo -n for loop time for i; do printf '%s\n' "$i" done >/dev/null fi } [ $# -eq 3 ] || exit 1 echo case $1 $2 $3 f $1 $2 $3 echo -------------------------- ---------------------- end test.sh -------------->8------ -----8<--------------- begin main.sh ---------------------- BASH_PROGRAM=${1:-bash} PROCESS_PROFILE_DATA=$2 for I in 0 1; do for J in 0 1; do for K in 0 1; do if [ $J -eq 0 ]; then # K becomes a "don't care" condition if [ $K -eq 0 ]; then K=X else continue fi fi $BASH_PROGRAM test.sh $I $J $K if [ -n "$PROCESS_PROFILE_DATA" ]; then mv gmon.out gmon.out.$I-$J-$K gprof $BASH_PROGRAM gmon.out.$I-$J-$K > gmon.out.$I-$J-$K.table fi done; done; done ---------------------- end main.sh -------------->8------ Run as "bash main.sh", for example.