I may have posted comments on this topic in the past (in which case, I apologize in advance for being repetitious). However when somebody advocates using PERFORM LOCKED OPERATION (PLO), I feel a strong need to intervene.
Conceptually, PLO is little different than a separate stream of individual instructions for performing the following: (a) serializing access by acquiring a lock; (b) doing a comparison, (c) if the comparison is equal, then loading, swapping, and/or storing one or more atoms of data, and (d) releasing the lock. As with any manually-written locking protocol, the use of PLO requires that anybody playing in this sandbox MUST follow the same protocol when messing with the data atoms. However, with PLO, the lock is hidden in the firmware (somewhere up in HSA), and — as observed by other CPUS who may not be using PLO — the examination and updating of the data atoms are not serialized in any way. Depending on the model, the number of lock tokens requested by the program (i.e., that thing that GR1 points to) may exceed the actual number of firmware locks available in a configuration. Some machines may have a few locks for the whole machine, others may have a lock or two per logical partition, others may have more. PLO is implemented in firmware, thus its performance may be less than that of carefully-crafted assembler code. The interface for using PLO — that is, (a) establishing a lock token to identify the thing being serialized, (b) building a parameter list for most functions, and (c) dealing with a nonzero condition code — does not follow the path of least astonishment. Most importantly (forgive me if I repeat myself), PLO does not play well with others — that is, with programs that use conventional serialization techniques such as COMPARE AND SWAP. This has been the bane of various MVS components in the past (where a maintenance programmer decided to use C&S on top of a PLO-managed queue), and it has caused similar customer problems as well. See programming note 4 on page 7-343 of the latest PoO (SA22-7832-11) for details. In his post of 30 July, Mr. Weinhold mentioned the transactional-execution (TX) facility introduced on the z12 (circa September 2012). TX is the ultimate solution to performing multiple serialized updates without the use of locks, and its performance can scale exceedingly well (as compared to locks) when a large number of CPUS are contending for shared memory. (See programming note 1 on p. 7-343.)
