Re: Circular Queue Handling in Assembler

Dan Greiner Fri, 02 Aug 2019 15:14:13 -0700

I may have posted comments on this topic in the past (in which case, I 
apologize in advance for being repetitious).  However when somebody advocates 
using PERFORM LOCKED OPERATION (PLO), I feel a strong need to intervene.


Conceptually, PLO is little different than a separate stream of individual 
instructions for performing the following: (a) serializing access by acquiring 
a lock; (b) doing a comparison, (c) if the comparison is equal, then loading, 
swapping, and/or storing one or more atoms of data, and (d) releasing the lock. 
 As with any manually-written locking protocol, the use of PLO requires that 
anybody playing in this sandbox MUST follow the same protocol when messing with 
the data atoms. 

However, with PLO, the lock is hidden in the firmware (somewhere up in HSA), 
and — as observed by other CPUS who may not be using PLO — the examination and 
updating of the data atoms are not serialized in any way. Depending on the 
model, the number of lock tokens requested by the program (i.e., that thing 
that GR1 points to) may exceed the actual number of firmware locks available in 
a configuration. Some machines may have a few locks for the whole machine, 
others may have a lock or two per logical partition, others may have more. 

PLO is implemented in firmware, thus its performance may be less than that of 
carefully-crafted assembler code. The interface for using PLO — that is, (a) 
establishing a lock token to identify the thing being serialized, (b) building 
a parameter list for most functions, and (c) dealing with a nonzero condition 
code — does not follow the path of least astonishment. 

Most importantly (forgive me if I repeat myself), PLO does not play well with 
others — that is, with programs that use conventional serialization techniques 
such as COMPARE AND SWAP. This has been the bane of various MVS components in 
the past (where a maintenance programmer decided to use C&S on top of a 
PLO-managed queue), and it has caused similar customer problems as well. See 
programming note 4 on page 7-343 of the latest PoO (SA22-7832-11) for details.

In his post of 30 July, Mr. Weinhold mentioned the transactional-execution (TX) 
facility introduced on the z12 (circa September 2012).  TX is the ultimate 
solution to performing multiple serialized updates without the use of locks, 
and its performance can scale exceedingly well (as compared to locks) when a 
large number of CPUS are contending for shared memory. (See programming note 1 
on p. 7-343.)

Re: Circular Queue Handling in Assembler

Reply via email to