On Thu, 12 Nov 2015, David Daney wrote: > > > Certainly we can load up the code with "SYNC" all over the place, but > > > it will kill performance on SMP systems. So, my vote would be to make > > > it as light weight as possible, but no lighter. That will mean > > > inventing the proper barrier primitives. > > > > It seems to me that the proper barrier here is a "SYNC 18" aka > > SYNC_RELEASE instruction, at least on CPUs that implement that variant.
For the record, we've had "cooked" aliases in the toolchain for a short while now -- since Sep 2010 or binutils 2.21 -- so for readability you can actually use `sync_release' in your source code rather than obscure `sync 18' (of course you could define a macro instead, but there's no need now), and disassembly will show the "cooked" mnemonic too. Although Documentation/Changes still lists binutils 2.12 as the minimum, so perhaps using macros is indeed the way to go now, at least for the time being. > Yes, unfortunately very few CPUs implement that. It is an instruction that > MIPS invented only recently, so older CPUs need a different solution. Hmm, it looks to me we might actually be safe, although as often the situation seems more complicated than it had to be. Conventional wisdom says that SYNC as the ultimate ordering barrier, aka SYNC 0, was added with the MIPS II ISA, with a provision to define less restrictive barriers in the future in a backward compatible manner, by the means of undefined (any non-zero at the time) barrier types defaulting to 0. Early references seem to have been lost in the mist of time, however a few legacy MIPS ISA documents remain, e.g. the MIPS IV ISA document says[1]: "The stype values 1-31 are reserved; they produce the same result as the value zero." making it clear that non-zero arguments will work as expected, albeit perhaps with a somewhat heavyweight effect. But there's sometimes no other way. This seems more ambiguous with earlier documentation available, e.g. the MIPS R4000 processor manual, which omits the mention of `stype' altogether and merely defines a single SYNC instruction encoding with all-zeros across bits 25:6 of the instruction word, among which `stype' normally lives[2]. This appears the same with other MIPS III processor documentation (e.g. IDT 79RV4700[3]). However I'm fairly sure all these simply did not bother decoding SYNC beyond the major and minor opcode, so again SYNC 0 semantics should be held across the more recently defined variants. I could this actually sometime with an R4000 class processor. Modern MIPS architecture specifications started with the same definition as the MIPS IV ISA had, rev. 0.95 documents still stated[4][5]: "The stype values 1-31 are reserved; they produce the same result as the value zero." Unfortunately the requirement got weakened later on, rev. 1.00 architecture specifications now stated[6][7]: "The stype values 1-31 are reserved for future extensions to the architecture. A value of zero will always be defined such that it performs all defined synchronization operations. Non-zero values may be defined to remove some synchronization operations. As such, software should never use a non-zero value of the stype field, as this may inadvertently cause future failures if non-zero values remove synchronization operations." I think the intent was not to break backwards compatibility, and certainly anyone who looked at one of the earlier documents might have realised that implementing non-zero SYNC operations, that do not have a vendor-specific semantics, as aliases to SYNC 0 rather than NOP or RI triggers would be a good idea. However implementers may not have been able to infer that from reading the lone current revision of architecture documents. It was only with rev. 2.60 of architecture specifications that along new SYNC operations the requirement for undefined SYNC operations to behave as SYNC 0 was put in the text back in an unambiguous form[8][9]: "A stype value of zero will always be defined such that it performs the most complete set of synchronization operations that are defined. This means stype zero always does a completion barrier that affects both loads and stores preceding the SYNC instruction and both loads and stores that are subsequent to the SYNC instruction. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization behaviors that are less complete than that of stype zero. If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must act the same as stype zero completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype zero completion barrier." This definition has then been retained in the architecture specification throughout now. Overall I think it should be safe after all to use SYNC_RELEASE and other modern lightweight barriers uncondtionally under the assumption that architecture was meant to remain backward compatible. Even though it might be possible someone would implement unusual semantics for the then undefined `stype' values, I highly doubt it as it would be extra effort and hardware logic space for no gain. We could try and reach architecture overseers to double-check whether the `stype' encodings, somewhat irregularly distributed, were indeed defined in a manner so as not to clash with values implementers chose to use before rev. 2.61 of the architecture specification. Then, for performance reasons, if there were indeed any pre-2.61 implementations which define vendor-specific lightweight barriers, then we could replace the standard encoding embedded in the kernel binary, by run-time patching the image up at bootstrap, based on the processor type identified in cpu-probe.c. Likewise, for implementations that are weakly enough ordered to define SYNC as an actual barrier rather than a different encoding of NOP (e.g. the NEC VR4100 is strongly ordered and implements SYNC as a NOP[10]), yet strongly enough ordered for some of the other barriers not to be necessary, the respective barriers could be patched up with NOPs. For I/O ordering and completion barriers, mentioned earlier in the thread, on the MIPS target we need a different set of primitives, as some early incarnations of the architecture were weakly ordered in this respect in a somewhat unusual way, at least to some. Only reads were strongly ordered in all cases. However writes could bypass each other, could be merged, or could be removed altogether (preempted with a later one). Then reads could bypass writes or read back a pending write. None of this matters for true memory, however it certainly does for I/O, where side effects exist or timely completion is required. I have previously outlined what needs to be implemented in this area, as recorded here: <http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=alpine.LFD.2.11.1404280048540.11598%40eddie.linux-mips.org>, to unify the uncoordinated platform attempts made so far. I still have it on my to-do list, hopefully to get at soon. References: [1] "MIPS IV Instruction Set", MIPS Technologies, Inc., Revision 3.2, By Charles Price, September, 1995, p. A-161 <http://techpubs.sgi.com/library/manuals/2000/007-2597-001/pdf/007-2597-001.pdf> [2] Joe Heinrich: "MIPS R4000 Microprocessor User's Manual", Second Edition, MIPS Technologies, Inc., April 1, 1994, p. A-161 <http://techpubs.sgi.com/library/manuals/2000/007-2489-001/pdf/007-2489-001.pdf> [3] "IDT79RV4700 RISC Processor Hardware User's Manual", Integrated Device Technology, Inc., Version 2.1, December 1997, p. A-130 [4] "MIPS32 Architecture For Programmers, Volume II: The MIPS32 Instruction Set", MIPS Technologies, Inc., Document Number: MD00086, Revision 0.95, March 12, 2001, p. 215 [5] "MIPS64 Architecture For Programmers, Volume II: The MIPS64 Instruction Set", MIPS Technologies, Inc., Document Number: MD00087, Revision 0.95, March 12, 2001, p. 300 [6] "MIPS32 Architecture For Programmers, Volume II: The MIPS32 Instruction Set", MIPS Technologies, Inc., Document Number: MD00086, Revision 1.00, August 29, 2002, p. 209 [7] "MIPS64 Architecture For Programmers, Volume II: The MIPS64 Instruction Set", MIPS Technologies, Inc., Document Number: MD00087, Revision 1.00, August 29, 2002, p. 295 [8] "MIPS32 Architecture For Programmers, Volume II: The MIPS32 Instruction Set", MIPS Technologies, Inc., Document Number: MD00086, Revision 2.60, June 25, 2008, p. 250 [9] "MIPS64 Architecture For Programmers, Volume II: The MIPS64 Instruction Set", MIPS Technologies, Inc., Document Number: MD00087, Revision 2.60, June 25, 2008, p. 317 [10] "VR4100 64-BIT MICROPROCESSOR USER'S MANUAL (PRELIMINARY)", NEC Corporation, Document No. U10050EJ3V0UM00 (3rd edition), January 1996, p. 413 Maciej

