Re: Fw: Dataspace versus common area above the bar

Kenneth Wilkerson Tue, 21 Jan 2014 18:15:53 -0800

I have never found comparing instruction speeds to be a fair gauge of
performance. It's not the choice of instructions (unless the original
choices were very poor) that affect performance but algorithms. As has been
pointed out, I have never seen any evidence that converting an algorithm
using data spaces and alets to one using 64 bit instructions and shared
memory objects would result in any measurable (2+%as an example) difference
in performance. However, if the change afforded a way to significantly
reduce the working set size or a way to search less frequently, this can
often yield significant reductions in overhead.


Some things are very difficult to quantify. For example, there is
significant argument over the advantages of transactional memory versus
locks. On the surface, locking is more efficient but at a cost to
throughput. Transactional memory can use more cycle but improve throughput.
So how do you quantify this?

Almost 30 years ago, I developed a non-traditional storage manager that does
not use chains. As a result, it does not experience storage fragmentation.
It's path length varies slightly from the 1st to the millionth call. As a
resut, it outperforms chained storage manager that require locks by many
factors. And as the number of calls grow, the performance factor increases. 

Again I have never seen significant gains from using the same algorithms and
simply changing the instructions. Whereas,  I have seen x-fold  performance
reductions by improving algorithms.

Kenneth


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On
Behalf Of Jim Mulder
Sent: Tuesday, January 21, 2014 7:25 PM
To: [email protected]
Subject: Re: Fw: Dataspace versus common area above the bar

> <begin extract>
> AMODE does not affect performance.  Can you explain which instructions
> you think are faster than some functional equivalent, and why you
> think they are faster?
> </end extract>
> 
> and it may be that what we have here is a misunderstanding of my
> language.  Let me begin with a little history.  On System/360 models
> above the model 30, L was faster than LH because they had  [at least]
> four-byte fetch widths and had to 'throw away' half of what they
> fetched for LH.
> 
> In my experience, and I have made many measurements, the same
> principle continues to apply mutatis mutandis today.
> 
> I, for example, have a pair of assembly-language glb-seeking binary
> search routines
> that search the same table of quadword elements.  One of these
> routines is AMODE(31) and one AMODE(64).    The table---The same
> assembled table is always used---contains 63 elements.   The usual 127
> searches are performed, each 256 times.  In the upshot the AMODE(64)
> routine is measurably, 2.1201%, faster.
> 
> I have performed similar tests using searches of ordered lists of
> 10(10)200 elements.  They are more addressing-intensive, and the
> superiority of the AMODE(64) routine increases almost linearly with
> table size, from 2.0897% for a list of 10 elements to 2.3311% for a
> list of 200 elements.
> 
> Now it may be that what you mean by "AMODE does not affect
> performance" is different from what I mean.  If so, I should be
> pleased to have you clarify the ways in which our uses of this word
> are different.

 From a hardware design engineer:
<quote>
All hardware instructions perform at the same speed in 64-bit mode or 
31-bit mode.  I assume the AMODE(31) and AMODE(64) he is referring to
only affects the addressing mode, but the exact same instruction 
sequences are used in both cases. If different code sequences are being
used, then all bets are off.  My first statement applies to the 
exact same code sequence in 64-bit addressing mode versus 31-bit
addressing mode. A few millicoded instructions do have slightly 
different path lengths depending on addressing mode, but even that
is not common.
<endquote>

  If you can send me the listings of the exact code that you are
measuring, I might be able to analyze the difference that
you are measuring.

  There certainly have been cases over the years where 
some processors required extra cycles to perform operand extension,
especially when involves sign bit propagation.  For specific
instructions on a specific processor, I can ask the engineers if
that is the case (as long as it is a recent enough processor that 
the engineers are still here). 
 
Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Fw: Dataspace versus common area above the bar

Reply via email to