Scott, thanks again. With that elaboration of your explanation, and the 
correction of my copying the wrong line from IBM, I think we have the answer.

> a 1 second job is relatively quick. 

Yeah, I just used 1 because it is simple. The real CPU times involved are from 
about 4 to 400 CPU seconds. (A range of jobs; NOT a 100:1 ratio between 
machines for the same job.)

> your CPU time increases as your application has to go further into the memory 
> hierarchy to find the data

I know that well! I've posted this story before, but here it goes again. I was 
until recently responsible for an event-driven application. I had a test driver 
that would queue "events" from a file at a specified pace, for regression 
testing, benchmarking, and so forth. When I drove the application very slowly 
-- say 10 events per second -- it used roughly TWICE as much CPU time per event 
as when I pushed events through it as fast as it could process them, which was 
several hundred times that fast. My theory -- did not have the means to confirm 
-- was that when I drove it hard it "owned" the cache lines.

When I started in this business it was an axiom that for a given workload wall 
clock times were variable but CPU times were deterministic. That obviously no 
longer holds.

> Nothing is simple...

Or as the Db2 folks like to say, "it depends."

I do think I now have my arms around this question.

Thanks all,

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Scott Chapman
Sent: Sunday, December 15, 2019 6:02 AM
To: [email protected]
Subject: Re: How do I compare CPU times on two machines?

>> The numbers below (from IBM.com) do not seem to support what you are saying 
>> however: "if you're trying to convert CPU time between machines, the ratio 
>> of any of SUs, MSUs, or PCI will be pretty much equally "fine"." The ratio 
>> of the PCI's of the two machines is about eight-to-one but they seem in 
>> practice to be *about* the same speed: that is, a job that uses about 1 CPU 
>> second on one seems to use about 1 CPU second on the other (certainly not 
>> eight times as much!). The SU/SEC ratio for the two machines is 40404/33333 
>> which seems to more accurately reflect observed reality (although way less 
>> than perfectly! -- less perfectly than a guess of "oh, I guess they are 
>> about the same speed").
>>
>> Processor    #CP     PCI             MSU     MSUps   Low     Average High
>> 2817-730     30      23,929  2,855   2,370   49.54   42.75           37.96
>>
>> Processor    #CP     PCI             MSU             Low     Average High
>> 2818-Z05     5       3,139           388             6.18    5.61            
>> 4.77
>>

Sorry... I failed to mention that you have to use the Per CPU ratings. SU/sec 
is already on a per CPU basis, which is why that number seems more in line with 
what you expect. 

23929 / 30 = 797.6    2855 / 30 = 95.1
3139 / 5 = 627.8     388 / 5 = 77.6

797.6 / 627.8 = 1.27
95.1 / 77.6 = 1.22
40404 / 33333 = 1.21

The PCI ratio is a bit farther off from the other two, but again, these are 
rough estimates and to that degree they're reasonably close. We're drawing with 
the fat crayons here, not fine drafting pens. 

But... I just realized you used the SU/sec from the 2818-Z04, not the Z05, 
which is 32258.

40404 / 32258 = 1.25

Which is pretty much in the middle of the other two ratios, so it all seems to 
match up as I'd expect now. 

Re. your "a job on one machine uses about 1 second of CPU and uses about 1 
second of CPU on the other". If 1.00 is about 1.25 then, I think all is as one 
might expect. 

But a 1 second job is relatively quick. And there's probably other work on the 
systems that could be influencing both. For example, the larger machine may 
have more work running that's having a larger negative impact on the test job 
running on that machine, so it could actually consume more CPU time than the 
test job running on the notionally slower machine if the slower machine is 
relatively idle when the test job runs.  LPAR configurations can also play in 
here, sometimes significantly. 

Remember, your CPU time increases as your application has to go further into 
the memory hierarchy to find the data. (I.E. if the instructions/data weren't 
in L1 cache.) So on a busier system, other work (especially higher priority) 
work may be making it harder for a particular test job to keep it's data closer 
to the processor core. That's also why you'll see potentially significant 
variations between runs of the same exact job.  That's why I always want to see 
multiple re-runs so I can understand the "normal" variation. (But one still 
needs to take into account the current system activity: "normal" variation will 
itself vary.)

Nothing is simple...

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to