On 2/11/12, Jeffrey Yasskin <[email protected]> wrote: > On Wed, Oct 12, 2011 at 11:55 AM, Jeffrey Yasskin <[email protected]> wrote: >> [+ Lawrence who's been driving the ABI-compatibility design. Context >> at >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20111010/047614.html] >> >> On Wed, Oct 12, 2011 at 10:57 AM, John McCall <[email protected]> wrote: >>> On Oct 12, 2011, at 9:03 AM, Jeffrey Yasskin wrote: >>>> On Wed, Oct 12, 2011 at 6:31 AM, Andrew MacLeod <[email protected]> >>>> wrote: >>>>> - language atomic types up to 16 bytes should be padded to an >>>>> appropriate >>>>> size, and aligned properly. >>>>> - if memory matching one of the 5 'optimized' sizes isn't aligned >>>>> properly, >>>>> results are undefined. >>>>> - if the size does not match one of the 5 specific routines, then the >>>>> library generic ABI can handle it. There's no alignment guarantees, so >>>>> I >>>>> presume it would end up being a locked implementation using hash tables >>>>> and >>>>> addresses or something. >>>> >>>> The ABI library needs to demand alignment guarantees, or have them >>>> passed in, or it won't be able to support larger lock-free sizes on >>>> new architectures. >>> >>> How aggressive are you suggesting we be about this? If I make this type >>> atomic: >>> struct { float values[5]; }; >>> do we really increase its size and alignment up to 32 bytes in the wild >>> hope that the architecture will add 32-byte atomics someday? If so, >>> what's the limit? If not, why is 16 the limit? >>> >> >> The goal was that architectures could add new atomic instructions >> without forcing an ABI change. Changing the size of atomic<FiveFloats> >> would be an ABI change, so we should try to plan ahead to avoid it. >> All the existing atomics have required alignments equal to their >> sizes, and whole-cacheline cmpxchg seems like a plausible future >> instruction and would also require alignment equal to the size, so >> that's what I've been suggesting. > > I think the recent announcement at > http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/, > that Intel plans to implement hardware transactions by making locked > regions cheaper, undermines my and Lawrence's position here. If these > new instructions work like they appear to, it'll be possible to > implement types with arbitrary sizes and alignments as cheaply as the > current lock-free operations, and it seems unlikely to me that Intel > would add larger lock-free operations once they have these > transactional instructions.
My guess is that they are exploiting cache line ownership. I expect there is a limit on the number of lines, but not small enough to affect 'reasonable' atomic types. Crossing a cache boundary will require holding both lines. If there is any false sharing on those lines, the performance could suffer badly. One advantage to super-aligning is that the probability of false sharing goes down. I suppose we could pass that problem back to the user, which in general they must deal with anyway. However, there is presently no C++ standard mechanism to respect cache line size and alignment. Forcing a bunch of platform-dependent code to address the performance doesn't seem like a good thing to do. Standardizing cache line size queries seems like a good way to unproductively spend lots of committee time. Grumble. -- Lawrence Crowl _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
