Scott, thanks again. With that elaboration of your explanation, and the correction of my copying the wrong line from IBM, I think we have the answer.
> a 1 second job is relatively quick. Yeah, I just used 1 because it is simple. The real CPU times involved are from about 4 to 400 CPU seconds. (A range of jobs; NOT a 100:1 ratio between machines for the same job.) > your CPU time increases as your application has to go further into the memory > hierarchy to find the data I know that well! I've posted this story before, but here it goes again. I was until recently responsible for an event-driven application. I had a test driver that would queue "events" from a file at a specified pace, for regression testing, benchmarking, and so forth. When I drove the application very slowly -- say 10 events per second -- it used roughly TWICE as much CPU time per event as when I pushed events through it as fast as it could process them, which was several hundred times that fast. My theory -- did not have the means to confirm -- was that when I drove it hard it "owned" the cache lines. When I started in this business it was an axiom that for a given workload wall clock times were variable but CPU times were deterministic. That obviously no longer holds. > Nothing is simple... Or as the Db2 folks like to say, "it depends." I do think I now have my arms around this question. Thanks all, Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Scott Chapman Sent: Sunday, December 15, 2019 6:02 AM To: [email protected] Subject: Re: How do I compare CPU times on two machines? >> The numbers below (from IBM.com) do not seem to support what you are saying >> however: "if you're trying to convert CPU time between machines, the ratio >> of any of SUs, MSUs, or PCI will be pretty much equally "fine"." The ratio >> of the PCI's of the two machines is about eight-to-one but they seem in >> practice to be *about* the same speed: that is, a job that uses about 1 CPU >> second on one seems to use about 1 CPU second on the other (certainly not >> eight times as much!). The SU/SEC ratio for the two machines is 40404/33333 >> which seems to more accurately reflect observed reality (although way less >> than perfectly! -- less perfectly than a guess of "oh, I guess they are >> about the same speed"). >> >> Processor #CP PCI MSU MSUps Low Average High >> 2817-730 30 23,929 2,855 2,370 49.54 42.75 37.96 >> >> Processor #CP PCI MSU Low Average High >> 2818-Z05 5 3,139 388 6.18 5.61 >> 4.77 >> Sorry... I failed to mention that you have to use the Per CPU ratings. SU/sec is already on a per CPU basis, which is why that number seems more in line with what you expect. 23929 / 30 = 797.6 2855 / 30 = 95.1 3139 / 5 = 627.8 388 / 5 = 77.6 797.6 / 627.8 = 1.27 95.1 / 77.6 = 1.22 40404 / 33333 = 1.21 The PCI ratio is a bit farther off from the other two, but again, these are rough estimates and to that degree they're reasonably close. We're drawing with the fat crayons here, not fine drafting pens. But... I just realized you used the SU/sec from the 2818-Z04, not the Z05, which is 32258. 40404 / 32258 = 1.25 Which is pretty much in the middle of the other two ratios, so it all seems to match up as I'd expect now. Re. your "a job on one machine uses about 1 second of CPU and uses about 1 second of CPU on the other". If 1.00 is about 1.25 then, I think all is as one might expect. But a 1 second job is relatively quick. And there's probably other work on the systems that could be influencing both. For example, the larger machine may have more work running that's having a larger negative impact on the test job running on that machine, so it could actually consume more CPU time than the test job running on the notionally slower machine if the slower machine is relatively idle when the test job runs. LPAR configurations can also play in here, sometimes significantly. Remember, your CPU time increases as your application has to go further into the memory hierarchy to find the data. (I.E. if the instructions/data weren't in L1 cache.) So on a busier system, other work (especially higher priority) work may be making it harder for a particular test job to keep it's data closer to the processor core. That's also why you'll see potentially significant variations between runs of the same exact job. That's why I always want to see multiple re-runs so I can understand the "normal" variation. (But one still needs to take into account the current system activity: "normal" variation will itself vary.) Nothing is simple... ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
