Re: Avoiding SIIS - (Was Base-less macros)

Seymour J Metz Thu, 11 Nov 2021 18:31:12 -0800

Sorry, senior moment. But it will do it twice, and the double traslate of the 
the first byte is probably not what you want.

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Assembler List [[email protected]] on behalf 
of Keith Moe [[email protected]]
Sent: Thursday, November 11, 2021 4:51 PM
To: [email protected]
Subject: Re: Avoiding SIIS - (Was Base-less macros)

 Actually the inline TR/EX will do the TR the first time for ONE byte, not 256, 
followed by the EX of the specified length.

Keith Moe
BMC Software

     On Thursday, November 11, 2021, 01:44:07 PM PST, Seymour J Metz 
<[email protected]> wrote:

 There are bigger problems than cache in that example; the EX/TR will translate 
twice, the first time with a length of 256.

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Assembler List [[email protected]] on behalf 
of Tony Harminc [[email protected]]
Sent: Wednesday, November 10, 2021 11:53 PM
To: [email protected]
Subject: Re: Avoiding SIIS - (Was Base-less macros)

On Wed, 10 Nov 2021 at 11:45, Wendell Lovewell
<[email protected]> wrote:
>
> I'm reluctant to admit this, but I'm still unclear about SIIS issues.  Could 
> someone please explain what happens to the D- and I-cache in these situations?

I can tell you my understanding, but it's certainly not definitive. I
just read the books that the people who really know write.

Most probably none of these examples does a SIIS, but there isn't
enough information to be 100% sure. I will assume that none of R14 in
examples 1 and 2, or R5 in example 3, points anywhere near the code
you are showing. I will also assume that a cache line is 256 bytes, as
it is on all recent machines. (You can use the unprivileged ECAG
instruction to ask the machine what the I- and D- cache line sizes
are. But of course you can't take that run-time information and turn
it back into input to the assembler. A Just In Time compiler, e.g. for
Java could do and probably does do that.)

> Example 1:
> TxtTRNull    TR    0(*-*,R14),NoNulls
>                      EX      R5,*-6

No SIIS. The fetch of the target of EX/EXRL is defined to be an
instruction fetch, so chances are the TR and the EX are in the same
I-cache line, in which case that is the only I-cache line in use
locally. If the TR happens to end on a cache line boundary, then
fetching the EX will bring in another I-cache line. The line
containing the TR is unlikely to be discarded, because it was just
used.

The two operands of TR could involve as many as two D-cache lines
each, or as few as one in total, depending on where the operands lie.

> Example 2:
> PgmConst  LOCTR ,
> TxtTRNull    TR    0(*-*,R14),NoNulls
> PgmCode    LOCTR ,
>                      EX      R5,TxtTRNull

No SIIS. Almost the same as example 1, but the TR and EX are more
likely to be in different cache lines because they're more likely to
be further apart. You don't show it, but if the PgmConst area contains
data as well as the TR instruction, then referencing that data will
bring it into a D-cache line. It's not wrong to have this situation,
and any performance hit should come only from the fact of having the
same bytes in two cache lines, and therefore excluding some other
info. I don't believe there is any direct interaction as long as
nobody stores into either area. But note that that includes any code
that stores into any part of the cache line, and that in turn includes
code that may not be executed in a given case but that has been
fetched and analysed as part of branch prediction.

> Example 3:
>          GENCB BLK=ACB,AM=VSAM,MACRF=(KEY,DIR,SEQ,IN),
>                LOC=ANY,RMODE31=ALL,
>                MF=(G,(R5),GENCBLN)
> +GENCBLN  EQU  56            LENGTH OF PARM LIST AREA USED
> +        CNOP  0,4
> +        BAL  15,*+44                BRANCH OVER CONSTANTS
> +        DC    AL1(160)                BLOCK TYPE CODE
> +        DC    AL1(1)                    FUNCTION TYPE CODE
> (16 "DC" lines removed)
> +        DC    AL2(0)                  RESERVED                    @
> +        DC    B'10000000000000000000000000000000'
> +        LR    1,R5                      POINT TO PARAMETER LIST AREA
> +        MVC  16(40,1),0(15)      MOVE ACES TO AREA

No SIIS. Some of the 40 bytes after the BAL 15,*+44 are likely to be
in both kinds of cache line after MVC executes. Still no problem -
again as long as nobody is storing into any part of the data that
lives in the cache line.

Oh - and after typing all that, here's the quote I've been looking for
from the IBM Z / LinuxONE System Processor Optimization Primer that
was mentioned here a few months ago:

"No performance concern is expected with read-only copies of the same
cache line in both the instruction and data caches. The SIIS
inefficiency occurs when the processor detects the same line is in
both the instruction and data caches and the data cache's copy is
potentially to be updated (including any conditional paths not
expected to be executed), at which point an expensive cache
synchronization action is needed. So long as both copies of the line
in the instruction and data caches remain identical, the
synchronization action does not occur, and there should be no
performance penalty."

> (Please forgive the formatting - it's tough to line things up in a 
> proportional font.)

Virtually impossible. I see some of your lines well aligned; others
not so much. But it's not uncomfortable to read, so thanks for taking
the trouble.

Tony H.

Re: Avoiding SIIS - (Was Base-less macros)

Reply via email to