On 11/04/2011 04:34 PM, Joseph S. Myers wrote:
Likewise.
I think I got all those, plus a couple of more I noticed along the way.
* doc/extend.texi: Document __atomic built-in functions.
* doc/invoke.texi: Document data race parameters.
* doc/md.texi: Document atomic patterns.
Index: extend.texi
===================================================================
*** extend.texi (revision 180839)
--- extend.texi (working copy)
*************** extensions, accepted by GCC in C90 mode
*** 79,85 ****
* Return Address:: Getting the return or frame address of a function.
* Vector Extensions:: Using vector instructions through built-in functions.
* Offsetof:: Special syntax for implementing @code{offsetof}.
! * Atomic Builtins:: Built-in functions for atomic memory access.
* Object Size Checking:: Built-in functions for limited buffer overflow
checking.
* Other Builtins:: Other built-in functions.
--- 79,86 ----
* Return Address:: Getting the return or frame address of a function.
* Vector Extensions:: Using vector instructions through built-in functions.
* Offsetof:: Special syntax for implementing @code{offsetof}.
! * __sync Builtins:: Legacy built-in functions for atomic memory access.
! * __atomic Builtins:: Atomic built-in functions with memory model.
* Object Size Checking:: Built-in functions for limited buffer overflow
checking.
* Other Builtins:: Other built-in functions.
*************** is a suitable definition of the @code{of
*** 6682,6689 ****
may be dependent. In either case, @var{member} may consist of a single
identifier, or a sequence of member accesses and array references.
! @node Atomic Builtins
! @section Built-in functions for atomic memory access
The following builtins are intended to be compatible with those described
in the @cite{Intel Itanium Processor-specific Application Binary Interface},
--- 6683,6690 ----
may be dependent. In either case, @var{member} may consist of a single
identifier, or a sequence of member accesses and array references.
! @node __sync Builtins
! @section Legacy __sync built-in functions for atomic memory access
The following builtins are intended to be compatible with those described
in the @cite{Intel Itanium Processor-specific Application Binary Interface},
*************** previous memory loads have been satisfie
*** 6815,6820 ****
--- 6816,7053 ----
are not prevented from being speculated to before the barrier.
@end table
+ @node __atomic Builtins
+ @section Built-in functions for memory model aware atomic operations
+
+ The following built-in functions approximately match the requirements for
+ C++11 memory model. Many are similar to the @samp{__sync} prefixed built-in
+ functions, but all also have a memory model parameter. These are all
+ identified by being prefixed with @samp{__atomic}, and most are overloaded
+ such that they work with multiple types.
+
+ GCC will allow any integral scalar or pointer type that is 1, 2, 4, or 8
+ bytes in length. 16-byte integral types are also allowed if
+ @samp{__int128} (@pxref{__int128}) is supported by the architecture.
+
+ Target architectures are encouraged to provide their own patterns for
+ each of these built-in functions. If no target is provided, the original
+ non-memory model set of @samp{__sync} atomic built-in functions will be
+ utilized, along with any required synchronization fences surrounding it in
+ order to achieve the proper behaviour. Execution in this case is subject
+ to the same restrictions as those built-in functions.
+
+ If there is no pattern or mechanism to provide a lock free instruction
+ sequence, a call is made to an external routine with the same parameters
+ to be resolved at runtime.
+
+ The four non-arithmetic functions (load, store, exchange, and
+ compare_exchange) all have a generic version as well. This generic
+ version will work on any data type. If the data type size maps to one
+ of the integral sizes which may have lock free support, the generic
+ version will utilize the lock free built-in function. Otherwise an
+ external call is left to be resolved at runtime. This external call will
+ be the same format with the addition of a @samp{size_t} parameter inserted
+ as the first parameter indicating the size of the object being pointed to.
+ All objects must be the same size.
+
+ There are 6 different memory models which can be specified. These map
+ to the same names in the C++11 standard. Refer there or to the
+ @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki on
+ atomic synchronization} for more detailed definitions. These memory
+ models integrate both barriers to code motion as well as synchronization
+ requirements with other threads. These are listed in approximately
+ ascending order of strength.
+
+ @table @code
+ @item __ATOMIC_RELAXED
+ No barriers or synchronization.
+ @item __ATOMIC_CONSUME
+ Data dependency only for both barrier and synchronization with another
+ thread.
+ @item __ATOMIC_ACQUIRE
+ Barrier to hoisting of code and synchronizes with release (or stronger)
+ semantic stores from another thread.
+ @item __ATOMIC_RELEASE
+ Barrier to sinking of code and synchronizes with acquire (or stronger)
+ semantic loads from another thread.
+ @item __ATOMIC_ACQ_REL
+ Full barrier in both directions and synchronizes with acquire loads and
+ release stores in another thread.
+ @item __ATOMIC_SEQ_CST
+ Full barrier in both directions and synchronizes with acquire loads and
+ release stores in all threads.
+ @end table
+
+ When implementing patterns for these built-in functions , the memory model
+ parameter can be ignored as long as the pattern implements the most
+ restrictive @code{__ATOMIC_SEQ_CST} model. Any of the other memory models
+ will execute correctly with this memory model but they may not execute as
+ efficiently as they could with a more appropriate implemention of the
+ relaxed requirements.
+
+ Note that the C++11 standard allows for the memory model parameter to be
+ determined at runtime rather than at compile time. These built-in
+ functions will map any runtime value to @code{__ATOMIC_SEQ_CST} rather
+ than invoke a runtime library call or inline a switch statement. This is
+ standard compliant, safe, and the simplest approach for now.
+
+ @deftypefn {Built-in Function} @var{type} __atomic_load_n (@var{type} *ptr,
int memmodel)
+ This built-in function implements an atomic load operation. It returns the
+ contents of @code{*@var{ptr}}.
+
+ The valid memory model variants are
+ @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE},
+ and @code{__ATOMIC_CONSUME}.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} void __atomic_load (@var{type} *ptr,
@var{type} *ret, int memmodel)
+ This is the generic version of an atomic load. It will return the
+ contents of @code{*@var{ptr}} in @code{*@var{ret}}.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} void __atomic_store_n (@var{type} *ptr,
@var{type} val, int memmodel)
+ This built-in function implements an atomic store operation. It writes
+ @code{@var{val}} into @code{*@var{ptr}}. On targets which are limited,
+ 0 may be the only valid value. This mimics the behaviour of
+ @code{__sync_lock_release} on such hardware.
+
+ The valid memory model variants are
+ @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and @code{__ATOMIC_RELEASE}.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} void __atomic_store (@var{type} *ptr,
@var{type} *val, int memmodel)
+ This is the generic version of an atomic store. It will store the value
+ of @code{*@var{val}} into @code{*@var{ptr}}.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} @var{type} __atomic_exchange_n (@var{type}
*ptr, @var{type} val, int memmodel)
+ This built-in function implements an atomic exchange operation. It writes
+ @var{val} into @code{*@var{ptr}}, and returns the previous contents of
+ @code{*@var{ptr}}.
+
+ On targets which are limited, a value of 1 may be the only valid value
+ written. This mimics the behaviour of @code{__sync_lock_test_and_set} on
+ such hardware.
+
+ The valid memory model variants are
+ @code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE},
+ @code{__ATOMIC_RELEASE}, and @code{__ATOMIC_ACQ_REL}.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} void __atomic_exchange (@var{type} *ptr,
@var{type} *val, @var{type} *ret, int memmodel)
+ This is the generic version of an atomic exchange. It will store the
+ contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value
+ of @code{*@var{ptr}} will be copied into @code{*@var{ret}}.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} bool __atomic_compare_exchange_n (@var{type}
*ptr, @var{type} *expected, @var{type} desired, bool weak, int
success_memmodel, int failure_memmodel)
+ This built-in function implements an atomic compare and exchange operation.
+ This compares the contents of @code{*@var{ptr}} with the contents of
+ @code{*@var{expected}} and if equal, writes @var{desired} into
+ @code{*@var{ptr}}. If they are not equal, the current contents of
+ @code{*@var{ptr}} is written into @code{*@var{expected}}.
+
+ True is returned if @code{*@var{desired}} is written into
+ @code{*@var{ptr}} and the execution is considered to conform to the
+ memory model specified by @var{success_memmodel}. There are no
+ restrictions on what memory model can be used here.
+
+ False is returned otherwise, and the execution is considered to conform
+ to @var{failure_memmodel}. This memory model cannot be
+ @code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}. It also cannot be a
+ stronger model than that specified by @var{success_memmodel}.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} bool __atomic_compare_exchange (@var{type}
*ptr, @var{type} *expected, @var{type} *desired, bool weak, int
success_memmodel, int failure_memmodel)
+ This built-in function implements the generic version of
+ @code{__atomic_compare_exchange}. The function is virtually identical to
+ @code{__atomic_compare_exchange_n}, except the desired value is also a
+ pointer.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} @var{type} __atomic_add_fetch (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_sub_fetch (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_and_fetch (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_xor_fetch (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_or_fetch (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_nand_fetch (@var{type}
*ptr, @var{type} val, int memmodel)
+ These built-in functions perform the operation suggested by the name, and
+ return the result of the operation. That is,
+
+ @smallexample
+ @{ *ptr @var{op}= val; return *ptr; @}
+ @end smallexample
+
+ All memory models are valid.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} @var{type} __atomic_fetch_add (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_sub (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_and (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_xor (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_or (@var{type}
*ptr, @var{type} val, int memmodel)
+ @deftypefnx {Built-in Function} @var{type} __atomic_fetch_nand (@var{type}
*ptr, @var{type} val, int memmodel)
+ These built-in functions perform the operation suggested by the name, and
+ return the value that had previously been in @code{*@var{ptr}}. That is,
+
+ @smallexample
+ @{ tmp = *ptr; *ptr @var{op}= val; return tmp; @}
+ @end smallexample
+
+ All memory models are valid.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} void __atomic_thread_fence (int memmodel)
+
+ This built-in function acts as a synchronization fence between threads
+ based on the specified memory model.
+
+ All memory orders are valid.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} void __atomic_signal_fence (int memmodel)
+
+ This built-in function acts as a synchronization fence between a thread
+ and signal handlers based in the same thread.
+
+ All memory orders are valid.
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} bool __atomic_always_lock_free (size_t size)
+
+ This built-in function returns true if objects of size bytes will always
+ generate lock free atomic instructions for the target architecture.
+ Otherwise false is returned.
+
+ size must resolve to a compile time constant.
+
+ @smallexample
+ if (_atomic_always_lock_free (sizeof (long long)))
+ @end smallexample
+
+ @end deftypefn
+
+ @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size)
+
+ This built-in function returns true if objects of size bytes will always
+ generate lock free atomic instructions for the target architecture. If
+ it is not known to be lock free a call is made to a runtime routine named
+ @code{__atomic_is_lock_free}.
+
+ @end deftypefn
+
@node Object Size Checking
@section Object Size Checking Builtins
@findex __builtin_object_size
Index: invoke.texi
===================================================================
*** invoke.texi (revision 180839)
--- invoke.texi (working copy)
*************** The maximum number of conditional stores
*** 9155,9165 ****
--- 9155,9180 ----
if either vectorization (@option{-ftree-vectorize}) or if-conversion
(@option{-ftree-loop-if-convert}) is disabled. The default is 2.
+ @item allow-load-data-races
+ Allow optimizers to introduce new data races on loads.
+ Set to 1 to allow, otherwise to 0. This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+
@item allow-store-data-races
Allow optimizers to introduce new data races on stores.
Set to 1 to allow, otherwise to 0. This option is enabled by default
unless implicitly set by the @option{-fmemory-model=} option.
+ @item allow-packed-load-data-races
+ Allow optimizers to introduce new data races on packed data loads.
+ Set to 1 to allow, otherwise to 0. This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+
+ @item allow-packed-store-data-races
+ Allow optimizers to introduce new data races on packed data stores.
+ Set to 1 to allow, otherwise to 0. This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+
@item case-values-threshold
The smallest number of different values for which it is best to use a
jump-table instead of a tree of conditional branches. If the value is
*************** This option will enable GCC to use CMPXC
*** 13016,13022 ****
CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword)
data types. This is useful for high resolution counters that could be updated
by multiple processors (or cores). This instruction is generated as part of
! atomic built-in functions: see @ref{Atomic Builtins} for details.
@item -msahf
@opindex msahf
--- 13031,13038 ----
CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword)
data types. This is useful for high resolution counters that could be updated
by multiple processors (or cores). This instruction is generated as part of
! atomic built-in functions: see @ref{__sync Builtins} or
! @ref{__atomic Builtins} for details.
@item -msahf
@opindex msahf
Index: md.texi
===================================================================
*** md.texi (revision 180839)
--- md.texi (working copy)
*************** released only after all previous memory
*** 5628,5633 ****
--- 5628,5782 ----
If this pattern is not defined, then a @code{memory_barrier} pattern
will be emitted, followed by a store of the value to the memory operand.
+ @cindex @code{atomic_compare_and_swap@var{mode}} instruction pattern
+ @item @samp{atomic_compare_and_swap@var{mode}}
+ This pattern, if defined, emits code for an atomic compare-and-swap
+ operation with memory model semantics. Operand 2 is the memory on which
+ the atomic operation is performed. Operand 0 is an output operand which
+ is set to true or false based on whether the operation succeeded. Operand
+ 1 is an output operand which is set to the contents of the memory before
+ the operation was attempted. Operand 3 is the value that is expected to
+ be in memory. Operand 4 is the value to put in memory if the expected
+ value is found there. Operand 5 is set to 1 if this compare and swap is to
+ be treated as a weak operation. Operand 6 is the memory model to be used
+ if the operation is a success. Operand 7 is the memory model to be used
+ if the operation fails.
+
+ If memory referred to in operand 2 contains the value in operand 3, then
+ operand 4 is stored in memory pointed to by operand 2 and fencing based on
+ the memory model in operand 6 is issued.
+
+ If memory referred to in operand 2 does not contain the value in operand 3,
+ then fencing based on the memory model in operand 7 is issued.
+
+ If a target does not support weak compare-and-swap operations, or the port
+ elects not to implement weak operations, the argument in operand 5 can be
+ ignored. Note a strong implementation must be provided.
+
+ If this pattern is not provided, the @code{__atomic_compare_exchange}
+ built-in functions will utilize the legacy @code{sync_compare_and_swap}
+ pattern with an @code{__ATOMIC_SEQ_CST} memory model.
+
+ @cindex @code{atomic_load@var{mode}} instruction pattern
+ @item @samp{atomic_load@var{mode}}
+ This pattern implements an atomic load operation with memory model
+ semantics. Operand 1 is the memory address being loaded from. Operand 0
+ is the result of the load. Operand 2 is the memory model to be used for
+ the load operation.
+
+ If not present, the @code{__atomic_load} built-in function will either
+ resort to a normal load with memory barriers, or a compare-and-swap
+ operation if a normal load would not be atomic.
+
+ @cindex @code{atomic_store@var{mode}} instruction pattern
+ @item @samp{atomic_store@var{mode}}
+ This pattern implements an atomic store operation with memory model
+ semantics. Operand 0 is the memory address being stored to. Operand 1
+ is the value to be written. Operand 2 is the memory model to be used for
+ the operation.
+
+ If not present, the @code{__atomic_store} built-in function will attempt to
+ perform a normal store and surround it with any required memory fences. If
+ the store would not be atomic, then an @code{__atomic_exchange} is
+ attempted with the result being ignored.
+
+ @cindex @code{atomic_exchange@var{mode}} instruction pattern
+ @item @samp{atomic_exchange@var{mode}}
+ This pattern implements an atomic exchange operation with memory model
+ semantics. Operand 1 is the memory location the operation is performed on.
+ Operand 0 is an output operand which is set to the original value contained
+ in the memory pointed to by operand 1. Operand 2 is the value to be
+ stored. Operand 3 is the memory model to be used.
+
+ If this pattern is not present, the built-in function
+ @code{__atomic_exchange} will attempt to preform the operation with a
+ compare and swap loop.
+
+ @cindex @code{atomic_add@var{mode}} instruction pattern
+ @cindex @code{atomic_sub@var{mode}} instruction pattern
+ @cindex @code{atomic_or@var{mode}} instruction pattern
+ @cindex @code{atomic_and@var{mode}} instruction pattern
+ @cindex @code{atomic_xor@var{mode}} instruction pattern
+ @cindex @code{atomic_nand@var{mode}} instruction pattern
+ @item @samp{atomic_add@var{mode}}, @samp{atomic_sub@var{mode}}
+ @itemx @samp{atomic_or@var{mode}}, @samp{atomic_and@var{mode}}
+ @itemx @samp{atomic_xor@var{mode}}, @samp{atomic_nand@var{mode}}
+
+ These patterns emit code for an atomic operation on memory with memory
+ model semantics. Operand 0 is the memory on which the atomic operation is
+ performed. Operand 1 is the second operand to the binary operator.
+ Operand 2 is the memory model to be used by the operation.
+
+ If these patterns are not defined, attempts will be made to use legacy
+ @code{sync} patterns, or equivilent patterns which return a result. If
+ none of these are available a compare-and-swap loop will be used.
+
+ @cindex @code{atomic_fetch_add@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_sub@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_or@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_and@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_xor@var{mode}} instruction pattern
+ @cindex @code{atomic_fetch_nand@var{mode}} instruction pattern
+ @item @samp{atomic_fetch_add@var{mode}}, @samp{atomic_fetch_sub@var{mode}}
+ @itemx @samp{atomic_fetch_or@var{mode}}, @samp{atomic_fetch_and@var{mode}}
+ @itemx @samp{atomic_fetch_xor@var{mode}}, @samp{atomic_fetch_nand@var{mode}}
+
+ These patterns emit code for an atomic operation on memory with memory
+ model semantics, and return the original value. Operand 0 is an output
+ operand which contains the value of the memory location before the
+ operation was performed. Operand 1 is the memory on which the atomic
+ operation is performed. Operand 2 is the second operand to the binary
+ operator. Operand 3 is the memory model to be used by the operation.
+
+ If these patterns are not defined, attempts will be made to use legacy
+ @code{sync} patterns. If none of these are available a compare-and-swap
+ loop will be used.
+
+ @cindex @code{atomic_add_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_sub_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_or_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_and_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_xor_fetch@var{mode}} instruction pattern
+ @cindex @code{atomic_nand_fetch@var{mode}} instruction pattern
+ @item @samp{atomic_add_fetch@var{mode}}, @samp{atomic_sub_fetch@var{mode}}
+ @itemx @samp{atomic_or_fetch@var{mode}}, @samp{atomic_and_fetch@var{mode}}
+ @itemx @samp{atomic_xor_fetch@var{mode}}, @samp{atomic_nand_fetch@var{mode}}
+
+ These patterns emit code for an atomic operation on memory with memory
+ model semantics and return the result after the operation is performed.
+ Operand 0 is an output operand which contains the value after the
+ operation. Operand 1 is the memory on which the atomic operation is
+ performed. Operand 2 is the second operand to the binary operator.
+ Operand 3 is the memory model to be used by the operation.
+
+ If these patterns are not defined, attempts will be made to use legacy
+ @code{sync} patterns, or equivilent patterns which return the result before
+ the operation followed by the arithmetic operation required to produce the
+ result. If none of these are available a compare-and-swap loop will be
+ used.
+
+ @cindex @code{mem_thread_fence@var{mode}} instruction pattern
+ @item @samp{mem_thread_fence@var{mode}}
+ This pattern emits code required to implement a thread fence with
+ memory model semantics. Operand 0 is the memory model to be used.
+
+ If this pattern is not specified, all memory models except
+ @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
+ barrier pattern.
+
+ @cindex @code{mem_signal_fence@var{mode}} instruction pattern
+ @item @samp{mem_signal_fence@var{mode}}
+ This pattern emits code required to implement a signal fence with
+ memory model semantics. Operand 0 is the memory model to be used.
+
+ This pattern should impact the compiler optimizers the same way that
+ mem_signal_fence does, but it does not need to issue any barrier
+ instructions.
+
+ If this pattern is not specified, all memory models except
+ @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
+ barrier pattern.
+
@cindex @code{stack_protect_set} instruction pattern
@item @samp{stack_protect_set}