There are bigger problems than cache in that example; the EX/TR will translate twice, the first time with a length of 256.
-- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 ________________________________________ From: IBM Mainframe Assembler List [[email protected]] on behalf of Tony Harminc [[email protected]] Sent: Wednesday, November 10, 2021 11:53 PM To: [email protected] Subject: Re: Avoiding SIIS - (Was Base-less macros) On Wed, 10 Nov 2021 at 11:45, Wendell Lovewell <[email protected]> wrote: > > I'm reluctant to admit this, but I'm still unclear about SIIS issues. Could > someone please explain what happens to the D- and I-cache in these situations? I can tell you my understanding, but it's certainly not definitive. I just read the books that the people who really know write. Most probably none of these examples does a SIIS, but there isn't enough information to be 100% sure. I will assume that none of R14 in examples 1 and 2, or R5 in example 3, points anywhere near the code you are showing. I will also assume that a cache line is 256 bytes, as it is on all recent machines. (You can use the unprivileged ECAG instruction to ask the machine what the I- and D- cache line sizes are. But of course you can't take that run-time information and turn it back into input to the assembler. A Just In Time compiler, e.g. for Java could do and probably does do that.) > Example 1: > TxtTRNull TR 0(*-*,R14),NoNulls > EX R5,*-6 No SIIS. The fetch of the target of EX/EXRL is defined to be an instruction fetch, so chances are the TR and the EX are in the same I-cache line, in which case that is the only I-cache line in use locally. If the TR happens to end on a cache line boundary, then fetching the EX will bring in another I-cache line. The line containing the TR is unlikely to be discarded, because it was just used. The two operands of TR could involve as many as two D-cache lines each, or as few as one in total, depending on where the operands lie. > Example 2: > PgmConst LOCTR , > TxtTRNull TR 0(*-*,R14),NoNulls > PgmCode LOCTR , > EX R5,TxtTRNull No SIIS. Almost the same as example 1, but the TR and EX are more likely to be in different cache lines because they're more likely to be further apart. You don't show it, but if the PgmConst area contains data as well as the TR instruction, then referencing that data will bring it into a D-cache line. It's not wrong to have this situation, and any performance hit should come only from the fact of having the same bytes in two cache lines, and therefore excluding some other info. I don't believe there is any direct interaction as long as nobody stores into either area. But note that that includes any code that stores into any part of the cache line, and that in turn includes code that may not be executed in a given case but that has been fetched and analysed as part of branch prediction. > Example 3: > GENCB BLK=ACB,AM=VSAM,MACRF=(KEY,DIR,SEQ,IN), > LOC=ANY,RMODE31=ALL, > MF=(G,(R5),GENCBLN) > +GENCBLN EQU 56 LENGTH OF PARM LIST AREA USED > + CNOP 0,4 > + BAL 15,*+44 BRANCH OVER CONSTANTS > + DC AL1(160) BLOCK TYPE CODE > + DC AL1(1) FUNCTION TYPE CODE > (16 "DC" lines removed) > + DC AL2(0) RESERVED @ > + DC B'10000000000000000000000000000000' > + LR 1,R5 POINT TO PARAMETER LIST AREA > + MVC 16(40,1),0(15) MOVE ACES TO AREA No SIIS. Some of the 40 bytes after the BAL 15,*+44 are likely to be in both kinds of cache line after MVC executes. Still no problem - again as long as nobody is storing into any part of the data that lives in the cache line. Oh - and after typing all that, here's the quote I've been looking for from the IBM Z / LinuxONE System Processor Optimization Primer that was mentioned here a few months ago: "No performance concern is expected with read-only copies of the same cache line in both the instruction and data caches. The SIIS inefficiency occurs when the processor detects the same line is in both the instruction and data caches and the data cache's copy is potentially to be updated (including any conditional paths not expected to be executed), at which point an expensive cache synchronization action is needed. So long as both copies of the line in the instruction and data caches remain identical, the synchronization action does not occur, and there should be no performance penalty." > (Please forgive the formatting - it's tough to line things up in a > proportional font.) Virtually impossible. I see some of your lines well aligned; others not so much. But it's not uncomfortable to read, so thanks for taking the trouble. Tony H.
