craig.topper added a comment.

It looks like gcc implements additional bits that can be passed to 
_atomic_exchange and friends, __ATOMIC_HLE_ACQUIRE(1 << 16) and 
__ATOMIC_HLE_RELEASE(1 << 17). Basically they're using bits above bit 16 in the 
order/memory_model as target specific flags. These constants are only defined 
when targeting X86 and they are validated to ensure they are only paired with 
the appropriate __ATOMIC_ACQUIRE or __ATOMIC_RELEASE or a stronger memory model.

As Reid said, its technically safe to drop the hints sometimes so we could use 
SubClassOptiionalData or metadata. But losing them could have performance 
implications. If you lose an XACQUIRE, the lock won't be elided as the user 
expected. And if you keep an XACQUIRE, but lose an XRELEASE the processor will 
keep trying to speculate farther than it should until it eventually hits some 
random abort trigger and has to rollback to really acquiring the lock. Both of 
these would be surprising to the user so we should make an effort not to lose 
the information as much as possible.

Here's a start at an implementation proposal with some embedded questions.
-Add the X86 __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE matching the gcc 
encoding value.
-Write these intrinsics to pass these flags.
-Teach CGAtomic.cpp to lower those hints to whatever IR representation we 
choose. If we choose SubclassOptionalData, we'll also need to add bitcode, LL 
parsing, and printing support. Not sure what we would need for metadata.
-Add an HLE_ACQUIRE and HLE_RELEASE prefixed version of every instruction that 
can be prefixed to the X86Instr*.td files with appropriate isel patterns. This 
matches what we do for LOCK already. This is probably somewhere between 130-150 
instructions after tblgen expansion for operand sizes, immediate vs register, 
etc. Ideally we'd devise some way to tag MachineInstr* with a lock, hle 
acquire, and hle release so that we didn't need separate instruction opcodes 
for each permutation. But this would just make things scale better is not 
required for functionality.
-Need a way to represent this in SelectionDAG so X86 specific code can create 
the right target specific nodes. Do we have a metadata infrastructure there? Or 
should we store it with the ordering MachineMemOperand? Or in SDNodeFlags?

Obviously a lot of that will take some time. I wonder if it makes sense to add 
the __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE constants, but ignore them in 
CGAtomics.cpp for now? We could then implement these intrinsics with the code 
we ultimately want to see there, but not implement the hints yet. Thoughts?


Repository:
  rC Clang

https://reviews.llvm.org/D47672



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to