The S/370 PoOps mentions the vector facility and says "Vector operations are described in the publication IBM System/370 Vector Operations, SA22-7125."
You can download it from http://bitsavers.org/pdf/ibm/370/vectorFacility/SA22-7125-3_Vector_Operations_Aug88.pdf -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 ________________________________________ From: IBM Mainframe Assembler List [[email protected]] on behalf of Dan Greiner [[email protected]] Sent: Friday, June 5, 2020 2:56 PM To: [email protected] Subject: Re: Does the z architecture have something like the SIMD instructions Although it does actually access multiple data items, PERFORM LOCK OPERATION (PLO) really doesn't qualify as a SIMD instruction (see my PLO screed below). Seymour's reference to the Wikipedia page (https://secure-web.cisco.com/1g9JjH5spQKTKip_YcVCSGxFPyS93bsF0rty3cayA3B2sZ2D4Q_El-WD75GMFVbcIZJHbWIdhnz469e8c96r3NQvd7MnrTdsIQegSzR5roKxSGwI3UJUFgG2cmwu22PBp8FShQeXm8O8D8JfdpCMC5LZvgP2ONFYFDRlByNSKUu-v2XTrrLoMUS10BP1xSiGJb729bBpfKoSFFuWCMkiDCKa1j1urk56bSobXkpIgGPXGpwibHEteCcsvsmN0OnEFlv7bpRAl7PWToJp0CJ88K6BhT-vIQPZp4oiUQqBIiCCRV4S-ztbTed7wno5eXxGUEQ2SO7DQMf6itV_ZR_UMptAT0vQigtdqmIzTpk6cQ0hoJZv4y1bPTmiNvuA7kAXnlMuWJxU0xkyApGW0Qgrz92gzowCKCkNd6HIam5wuiRAFEquej9eyAeTJl87GKprc/https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSIMD%29 is about as adequate a definition as any I've seen. As I recall, IBM's original implementation of vector instructions appeared as an optional extension to ESA/390, but these were never part of the standard architecture defined in the PoO. With the advent of the z13 (2015), IBM added vector instructions to the general architecture, and added Chapters 21-24 to the PoO. There are 32 vector registers, each having 128 bits ... but the 64 bits of VRs 0-15 are the same as floating-point register 0-15. This is not to say that VRs are necessarily floating-point entities; they can be binary integers, strings, or floating point. With the introduction of the z14 (2017), IBM added (a) new instructions that enhanced the existing VR facility, and (b) a vector packed-decimal facility (the latter being a benefit to COBOL and other packed players). With the introduction of the z15 (2019), IBM added a second enhancement to the VR facility. There are now around 190 separate vector instructions — with a mind-boggling array of extended mnemonics. If you haven't bothered to download a PoO in the last few years, it's worth it (but if you choose to print it, have two reams of paper handy). Check out SA22-7832-15 for the latest version. Regarding PLO, this provides the means by which multiple, discontiguous storage locations can appear to be updated atomically without having to bother acquiring a lock. However, in order for PLO to operate properly, EVERY program that inspects or modifies those storage locations also has to do it with PLO. This is because the firmware for PLO gets its own lock in HSA, and serializes other CPUs attempts to use PLO with that lock. If other programs on other CPUs examine the data, the updates do not necessarily appear to be atomic. And, if some programs use PLO and others try to perform updates with classic compare-and-swap logic, really BAD things happen (as certain z/OS developers have discovered more than once). If nobody was actually using PLO, I would have quietly proposed removing it from the architecture, but (alas) there are some OS components that have actually managed to use it properly. For a far more flexible (and higher performance) means of atomic updates of multiple storage locations, check out the transactional-execution facility introduced in the z12 (2012).
