Art Celestini wrote: > I'm convinced that TRE and TR are faster but it seems that > a truly "fair" comparison of solutions to the stated problem > should have included equivalent moves in the TRE and TR > solutions.
I did write and run versions with the code like that. And, I said so: | I've got code that needs to translate stuff | in a buffer and it does not need it moved. And I have other | code that first moves it and then translates it, because it | doesn't want to clobber what it's translating. But, I did it | both ways, just to find out for sure if it made a difference. | It does not. But since you asked, I added those into the mix, so you can see and judge for yourself: TIME (IN SECONDS) FOR 001,000,000 REPETITIONS OF: ------------------------------------------------- --BYTES- NO TR(E) TRE INPL TRE MVC TR INPL TR MVC ======== --------- --------- --------- --------- --------- 00000800 14.939655 1.245189 1.642310 1.082476 1.236875 00000400 7.162529 0.731567 0.971124 0.487941 0.580783 00000200 3.593004 0.461754 0.673962 0.206117 0.268123 00000100 1.802772 0.253433 0.342846 0.032038 0.050725 000000C0 1.355390 0.240958 0.311724 0.031969 0.048488 00000080 0.909253 0.210573 0.276103 0.031942 0.046119 00000040 0.463195 0.150320 0.164585 0.032047 0.043604 00000020 0.238923 0.101492 0.113927 0.032032 0.042417 0000001E 0.225827 0.111231 0.122245 0.032019 0.042544 0000001C 0.210944 0.110432 0.122021 0.031966 0.042432 0000001A 0.197080 0.110823 0.122119 0.031953 0.042508 00000018 0.183400 0.104318 0.116599 0.031982 0.042673 00000016 0.169207 0.099349 0.110853 0.031980 0.042465 00000014 0.155477 0.100393 0.109962 0.032081 0.042704 00000012 0.141733 0.099860 0.111362 0.031961 0.042495 00000010 0.127308 0.070471 0.083389 0.031962 0.041866 0000000E 0.113336 0.074843 0.086993 0.031981 0.041867 0000000C 0.099318 0.073958 0.086677 0.031962 0.041833 0000000A 0.085462 0.074848 0.086733 0.032057 0.041985 00000008 0.071609 0.069932 0.081476 0.030228 0.038990 00000007 0.064623 0.058755 0.068647 0.030245 0.039025 00000006 0.057541 0.058729 0.068720 0.030278 0.038971 00000005 0.050582 0.058701 0.068568 0.030230 0.038931 00000004 0.043603 0.058764 0.068620 0.030246 0.039029 00000003 0.036664 0.058748 0.068683 0.030220 0.038934 00000002 0.029665 0.058824 0.068732 0.030386 0.039100 00000001 0.022716 0.059113 0.069109 0.029829 0.038662 00000000 0.005250 0.016894 0.005825 0.005239 0.005835 TESTNAME DESCRIPTION -------- -------------------------------------------- NO TR(E) Basic move and translate, one byte at a time TRE INPL TRE loop in-place TRE MVC TRE loop buffer-to-buffer move first TR INPL TR loop in-place TR MVC TR loop buffer-to-buffer move first TR is always faster than TRE. Having to move the data from an input buffer to a separate output buffer for translation increases the CPU time required by ~15%. That is still way less than the overhead of the basic move and translate, which is the fastest technique only for 0, 1, 2, or 3 bytes (for more than 3 bytes, the basic TR loop, or even the TR loop with the data to be translated having to be moved to the output buffer first, is fastest). The above figures include the "equivalent moves" to make it a 'truly "fair" comparison of solutions to the stated problem'. It reflects what I have always observed about such tests: a well-coded, basic, tight MVC loop (or an MVCL) is pretty fast compared to almost anything else that involves a half-dozen or so instructions that do virtually anything. Thus, counting the CPU time that is required to move the data to a separate buffer as part of the overhead doesn't actually add that much to the CPU time required to get the whole job done. I suspect that this is simply due to the fact that MVC and MVCL are already pretty well-optimized for the job they do. Even a basic, tight loop will be limited by some performance constraint, probably by the rate at which instructions whose execution cannot be overlapped can be pumped through the machine (in contrast to blobs of data MVCing and TR[T]ing thru the wires all as part of one instruction). Today, for all intents and purposes, the time required to execute any given standard instruction is the same as any other. This is because the work to be done can be done in the available time, before another instruction is fetched and shoved through the internal machinery. The instructions which process more than a word or two of data take longer, of course. But some of those are very highly optimized (in hardware -- for example, the LM and STM instructions are no longer pigs. They are, in fact, fairly effective substitutes for MVC, except that you toast the contents of several registers when you use enough to make it worthwhile. Thus, optimization in our world today, except for the scientific and numerical programmer, has become a job of simply cutting down the number of instructions one executes ... that is, simply cutting the path length. And that is the simple reason why a basic TR loop, even with an MVC/MVCL (just) ahead of it, is still the quickest way to get the job done, because fewer instructions overall are executed. -- WB ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html