Date: Tue, 16 May 2017 18:46:39 +0100 From: Stephane Chazelas <stephane.chaze...@gmail.com> Message-ID: <20170516174639.ga19...@chaz.gmail.com>
| Actually, even for a handful of arguments, and even with gawk, | it seems it's generally quicker to use awk in my tests: One can make benchmarks that produce whatever result one wants, of course, here's mine for what I expect to be a more life-like application that simply quoting zillions of strings, all of which just happen to contain multiple single quote characters, by accident... . "$1" # this supplies the definition of "quote" and no more for file in * do x=$(quote "$file") done Of course, while that is using "quote" the way I imagined it being used, the real work of a real-life script is missing (what we actually do with x once we have it, and everything else that happens) - but since we're only interested in benchmarking quote implementations, not the "real" part of the "real" application, and really, not the shell either, that's fine I think. This test means that we're not just measuring the startup time of the shell concerned (by running the shell over and over.) Steffen (and others) can decide if my test, or yours, is more relevant given the use he had in mind originally. I am running this (using bash for this, as the "time" reserved word, and TIMEFORMAT are not posix, I don't believe) using this script. I ran it 3 times in three different directories. printf "In a directory with %s files...\n\n" $(ls | wc -l) for shell in sh dash bash fbsh mksh zsh yash do for script in $(ls /tmp/T) do TIMEFORMAT="Shell $shell using script $script: %U" (time "$shell" /tmp/timer "/tmp/T/${script}") 2>&1 | expand -t 40 done printf '\n' done /tmp/timer is the first script I included above - though I also tested a version of that which checked the results (by expanding $x and verifying that the result was the same as $file - just to check that the quote functions worked correctly .. both did.) There is (or should be, no idea what the mailer will do) a tab before %U in TIMEFORMAT, that and the expand are just to make the results look pretty... This doesn't affect the results at all, nor does the subshell in that, and while there is probably a better way to make pretty results (without yet another awk script!), this worked well enough.) In that list of shells, "sh" is (a quite old version of) the NetBSD shell, "fbsh" is the FreeBSD shell (about a year old as are most of the others) and the rest you recognise. No ksh93 in there, as (for reasons not relevant here) I don't have that one installed on the system I'm running this test, and I don't have most of the others installed on a system where I do have ksh93. There are just 2 scripts in /tmp/T, yours (exactly as you gave it in the last message) and mine (modified from the version I had, which dealt with leading/trailing ' chars by processing them specially, though that version was never posted here, instead now using Jilles' better solution to that problem .. though his generates even more redundant quotes in the cases it makes a difference - for a leading ' in the input, I would have output a leading \' in the result, now we get ''\' instead. Both are correct of course.) Incidentally, to refer to another point you made, by "redundant quotes" I did not really mean ones that actually quote something (though the example I gave was not a good one) but extra '' pairs inserted into the output, just because it is convenient for the implementation. They accomplish nothing, hurt nothing (except my eye-balls), and add a little extra processing time for the shell when it processes the result, (more chars to deal with) but really don't matter one way or the other. I'll include the two quote functions used, for completeness, at the end. I did try a version of mine, modified in a non-posix way to use "local", rather than explicit saving and restoring (etc) but that only makes any difference at all when the arg to quote contains a single-quote char, which is likely very rare, so there was no real difference between the results for it, and the posix version of my script, it just cluttered the output, so I removed that one (even though if I were actually using this function that would be the version I'd use -- all sane shells support "local") Now we can expect that most directories will not contain files with ' chars in their names, in fact, there are most likely none - but this is also what is most likely to occur in any real life application. We have to deal with the possibility of it occurring, and handle it properly, but we do not have to expect it and optimise for it. I also realise that you could optimise your function in the same way I did mine (from the first version) to deal with that case, in which case the results from these tests would probably show almost the same times for your function and mine, so there's no need to do that and reply... Lastly, I have no idea which version of awk this is using (for your script), that has never bothered me much, but it is most likely not gawk (but with your version, you're always be going to be taking your chances with how fast/slow the local installed awk happens to be.) These are the results for 3 directories I ran the script in, the first is my $HOME (relatively small, but not so small as for the test to measure nothing), a medium-sized directory with often quite long and messy file names, and a directory with a large number of very bland file names (reasonably short ones.) In a directory with 71 files... Shell sh using script kre: 0.011 Shell sh using script stephane: 0.055 Shell dash using script kre: 0.010 Shell dash using script stephane: 0.057 Shell bash using script kre: 0.022 Shell bash using script stephane: 0.082 Shell fbsh using script kre: 0.011 Shell fbsh using script stephane: 0.046 Shell mksh using script kre: 0.052 Shell mksh using script stephane: 0.069 Shell zsh using script kre: 0.022 Shell zsh using script stephane: 0.077 Shell yash using script kre: 0.019 Shell yash using script stephane: 0.051 In a directory with 11522 files... Shell sh using script kre: 1.956 Shell sh using script stephane: 8.410 Shell dash using script kre: 1.780 Shell dash using script stephane: 9.196 Shell bash using script kre: 6.656 Shell bash using script stephane: 15.623 Shell fbsh using script kre: 1.852 Shell fbsh using script stephane: 7.831 Shell mksh using script kre: 8.788 Shell mksh using script stephane: 10.599 Shell zsh using script kre: 4.510 Shell zsh using script stephane: 14.621 Shell yash using script kre: 3.489 Shell yash using script stephane: 8.678 In a directory with 77062 files... Shell sh using script kre: 14.026 Shell sh using script stephane: 58.061 Shell dash using script kre: 13.398 Shell dash using script stephane: 64.897 Shell bash using script kre: 27.751 Shell bash using script stephane: 87.852 Shell fbsh using script kre: 13.407 Shell fbsh using script stephane: 53.715 Shell mksh using script kre: 63.250 Shell mksh using script stephane: 74.073 Shell zsh using script kre: 57.982 Shell zsh using script stephane: 132.030 Shell yash using script kre: 24.901 Shell yash using script stephane: 59.346 And finally, here are the scripts, first mine (/tmp/T/kre): ------------------------------------------------------------- quote() { case "$1" in *\'*) ;; # the harder case, we will get to below. *) printf "'%s'" "$1"; return 0;; esac _save_IFS="${IFS}"; ${IFS+":"} unset _save_IFS _save_OPTS="$(set +o)" IFS=\' set -f set -- ''$1'' _result_="${1}" shift for __A__ do _result_="${_result_}'\\''${__A__}" done printf "'%s'" "${_result_}" # now clean up IFS="${_save_IFS}"; ${_save_IFS+":"} unset IFS eval "${_save_OPTS}" unset _result_ _save_IFS _save_OPTS __A__ return 0; } ------------------------------------------------------------- and then yours (/tmp/T/stephane): ------------------------------------------------------------- quote() { LC_ALL=C awk -v q="'" -v b='\\' ' function quote(s) { gsub(q, q b q q, s) return q s q } BEGIN { sep = "" for (i = 1; i < ARGC; i++) { printf "%s", sep quote(ARGV[i]) sep = " " } if (sep) print "" }' "$@" } ------------------------------------------------------------- kre