Hi Roland, Roland Mainz p??e v P? 10. 07. 2009 v 00:58 +0200: > Milan Jurik wrote: > > V ??t, 09. 07. 2009 v 15:21, Sean McGrath p????e: > > > With the coming ksh93 update 2 and it replacing several commands > > > like wc, tail, head, join etc. Theres a need to have a benchmark > > > to measure at least before and after ksh93 update 2 change. > > > > > > Roland and I were talking on irc last night about this. We'll need > > > to figure out a decent method of benchmarking these commands. > > > > How is it possible that Roland discovers the responsible people > > everytime? :-) > > Well... part of the secret is that I use a komodo dragon (preferably a > hungry one), a whips (wet, with salt) and a small egg... that way you > can get every information out of people (yes, yes, it's cruel&&unusual) > ... =:-) >
Your latest source informed me about your magic already ;-) > > > So within the next few days we hope to work out a method for > > > benchmarking ksh93 > > > This hopefully is a start of that discussion, rather than blindly writing > > > adhoc timing scripts.. > > > > > > One way, suggested by Roland could be: > > > > > > cmd = mkdir: > > > > > > timex ksh93 -c 'rmdir "xyz" >/dev/null ; \ > > > for ((i=0 ; i < 1000 ; i++)) ; do /bin/mkdir -p "xyz" ; done' > > > > > > that would benchmark the on disk mkdir. To use the builtin ksh93's > > > mkdir, > > > just remove the '/bin/' > > > > > > timex ksh93 -c 'rmdir "xyz" >/dev/null ; \ > > > for ((i=0 ; i < 1000 ; i++)) ; do mkdir -p "xyz" ; done' > > > > Do not test it as ksh93 command, but through the wrapper. So not ksh93 > > -c 'tail', but /usr/bin/tail. That is the real impact. > > Erm... that's not 100% correct. The test matrix should look like this: > [ old-version, new-version, ksh93-builtin-command ] * [ C-locale, > multibyte-locale ] > > Explanation of terms: > - "old-version" means the old versions of the commands > - "new version" means the new versions of the commands > - "ksh93-buitin-commands" means running the loop within a ksh93 shell > using plain command names [1] [2] > - "C-locale" means something like $ LC_ALL=C ./test-script # > - "multibyte-locale" means something like $ LC_ALL=en_US.UTF-8 > ./test-script # - this is needed since the tools sometimes have > different codepaths for single-byte locales (like "C") and multibyte > locales (like "en_US.UTF-8" or "ja_JP.PCK") > > [1]=(this is important to measure the impact for OpenSolaris/Indiana > where the default system shell is ksh93 (e.g. /usr/bin/sh, /sbin/sh, > /usr/bin/ksh, /usr/bin/ksh93 are all ksh93)) > [2]=Note that a POSIX-conformant shell (like ksh93) will only use > builtin commands if you use the command name (e.g. "mkdir") and not the > full path (e.g. "/usr/bin/mkdir"). Or better: Using the full path makes > sure the shell always uses the non-builtin command from /usr/bin/ For C-team review I believe it is the most important the performance regression of replaced/updated commands, because "builtin-commands" you can bypass and are not the most important part of update 2. Also, comparison of builtin-command vs. old-version of command has nothing to do with performance regression testing, but it is more benchmark project (important to have, but not show stopper for update 2). > > > > Another method, using the above example could be to see how many times > > > mkdir got called in a given time period. > > > > The same amount of commands is good enough. Probably several times. > > > > > Other than basic benchmarking the environment too can be measured, i.e. > > > the locale can have an impact, e.g. LC_ALL=C and LC_ALL=en_US.UTF-8 > > > > +1 > > Right - see test matrix above... > > > > So too to be looked at is the datasize used with commands, eg > > > tail -X on a large or small file. Small being about 256k or so and > > > large being at least 1GB. > > > > +1 > > > > File bigger than RAM should be good. > > BTW: Some notes: > - "tail" _may_ now be a bit slower since it no longer uses |mmap()| > (which was one of the root causes for crashes (e.g. if the underlying > file shrinks while "tail" reads it)) > - some commands like "join" should be faster now since it uses |mmap()| > (but we have an option to turn this behaviour off to avoid running into > the issue described with "tail" above) > - command startup time may be slightly higher since we now depend on two > more libraries (e.g. libcmd, libast) which need to be looked-up&&loaded. > This should be a bit compensated by the detail that the AST tools are > tuned more for large amounts of data It is good to know and document as part of the performance results. But except "mmap" the impact should not be critical. > - please use tmpfs (e.g. /tmp) for reading/writing from/to files to > avoid getting noise from the disk I/O system > I think Sean's team is very good in performance testing ;-) > > > For starters is there a definite list of those command we'd want to > > > look at ? i.e. those being replaced by ksh93. > > > > I think the the list is definitive and you can find it here (in Notes): > > > > http://www.opensolaris.org/os/project/ksh93-integration/downloads/2009-07-02/ > > > > Optimal thing would be to test not only those which are replaced now, > > but also those which are already replaced and updated by this update. > > > > Only usr/bin/print is new command, so we do not need to test it. > > > > For testing all internal ksh93 commands, I would say no for now. It can > > be separate project, to do complete ksh93 benchmarking. But we should > > concentrate on update 2 for now. > > Well, it's not tought as "ksh93 benchmark" (since it doesn't cover any > special shell features like string processing, math, array operations > etc.) - the idea was to figure out the impact on OpenSolaris/Indiana > where the use of builtin commands in the default system shell has direct > impact on system performance (e.g. at _least_ fewer |fork()|+|exec()| > calls). > Yes. But then you are comparing only old and new ksh93. Not the core topic of update 2 for now, because update 2 concentrates on bugfixing and new commands. Best regards, Milan