On Fri, Aug 31, 2012 at 3:35 PM, Daniel Dunbar <[email protected]> wrote:
> On Tue, Aug 28, 2012 at 5:33 PM, Chandler Carruth <[email protected]> > wrote: > > On Tue, Aug 28, 2012 at 5:02 PM, John McCall <[email protected]> wrote: > >> > >> On Aug 28, 2012, at 4:34 PM, Chandler Carruth wrote: > >> > >> On Tue, Aug 28, 2012 at 3:52 PM, John McCall <[email protected]> > wrote: > >>> > >>> On Aug 28, 2012, at 3:18 PM, Chandler Carruth wrote: > >>> > I've added an explanatory comment. Hopefully won't get stale. > >>> > > >>> > Any other comments? I'd love to get this into the tree as we're > seeing > >>> > actual races in the wild due to it.... > >>> > >>> If you would like a fast review, perhaps a more minimal change would > >>> have been in order. We're not going to rush into a total rewrite of > >>> bitfield > >>> IR-generation, especially not one that introduces a host of new > >>> assumptions > >>> and dependencies on a very general but poorly-specified IR feature that > >>> has not previously been used in code with binary compatibility > >>> requirements. > >> > >> > >> So, first off, sorry for being in a hurry. I'm not trying to get a > rushed > >> review, just hoping to get review sooner rather than later. =D Anyways, > I > >> would *really* like thorough reviews of this. I'm very aware and > sensitive > >> to the importance of this code. In fact, that's why I'm working on it. > >> > >> Also, I too would have preferred a more minimal change. None seemed > >> readily available when both I and Richard Smith were looking at this in > >> detail. > >> > >> The primary idea we had was to bound the strided accesses to the size of > >> the bitfield storage rather than just to the containing struct. > >> > >> > >> Why not simply bound it to the offset of the next non-bitfield field > (or, > >> if there is none, the data size of the type)? > > > > > > You also have to bound the start, not just the stop. That means > reasonably > > often either issuing an unaligned load or peeling off the unaligned > chunks. > > In my testing, this happened reasonably often due to the ABI layout not > > requiring alignment in many cases, even when the type would typically > > provide some. > > > >> Is that challenging for some reason I'm not understanding? > > > > > > No, it's not challenging. As I explained in my previous email, doing this > > isn't hard, but in my testing it would very often result in byte-by-byte > > loads and stores. There seemed to be very good reason to not want that > for > > performance reasons (allowing more aggressive combining across the > bitfield, > > etc). What would be quite a bit more code (not particularly challenging, > > just a lot of code) would be peeling off the unaligned small chunks on > the > > front and the back, and using wide loads and stores for sections entirely > > within the bitfield. It seemed like the *exact* code we would already > need > > in LLVM to lower these integers effectively, and so it seemed preferably > for > > the frontend to emit a simple single integer load and store, and teach > the > > backend where necessary to handle it. > > > >> > >> I also happened to be hacking on SROA recently, and that pass was in > fact > >> forming similar IR structures you're worried about: arbitrarily sized > >> integers with bit insertion and extraction as a "vector" of "bits". I'm > not > >> claiming that it produced the exact same structures or that there aren't > >> bugs here. I would be more than happy to provide extra testing and try > to > >> flush out any crufty bits of LLVM here. I'm just saying I don't think > this > >> pattern is *completely* unused... > >> > >> Anyways, does even the vague approach seem good? Should I work on a > >> different strategy? Are there particular tests that I might be able to > run? > >> Any of this would be helpful feedback for me here. > >> > >> > >> I asked Daniel to make a detailed review, since he wrote the current > >> implementation and did a lot of investigation for it. > >> > >> The things that worry me most are: > >> > >> (1) How well-specified and -tested are the paths for non-power-of-2 > >> integer types in LLVM? Do we actually promise that, say, storing to an > i17 > >> will not touch the fourth byte? Do we provide reasonably portable > behavior > >> across targets, or is this an instance of the x86 backend having > received a > >> lot of attention and everything else being a festering morass of latent > >> issues? I know that SROA's behavior has exposed bugs before (IIRC with > the > >> occasional response of making SROA less aggressive), not least of which > >> because SROA's behavior doesn't really kick in a lot. > > > > > > Sure, this is a great point. I'm happy to do any testing I can here. One > > thing that I think I've already done is ensure that we only produce > > i8-multiples, not i17s and such. I'll double check, but I would not > expect > > to see any of those. My hope was that this would in the worst cased be > > trivially legalized to a sequence of i8 operations, and in the best case, > > the optimal pattern. > > > >> (2) How limiting is this going to be for bitfields on "weird" platforms? > > > > > > I would not expect this to be a problem, but perhaps I don't know the > > platforms where it will crop up. The intent is to treat this as a literal > > bag-of-bits. Any platform wher ethat works should be well supported. > > > >> > >> Do we get anything useful from this change on big-endian targets? > > > > > > I think so. The code is much less confusing to me now. That is, if I > > understood the original code correctly. I'm not at all sure we have > enough > > test coverage of big-endian IR generation to catch bugs, and I know that > I'm > > not an expert on their ABI requirements. Maybe someone can tell me what > to > > test for or look for in the output? > > > >> Will we find ourselves completely reimplementing it to support MS-style > >> packed bitfields? > > > > > > I don't know anything about MS-style packed bitfields, but I'll go > > digging.... > > > >> When I surveyed a few other people for their first impressions > (including > >> Chris and Daniel), there was also a lot of concern that the backend will > >> often not be able to widen these loads and stores because the high-level > >> information simply isn't there. > > > > > > Correct. The point is to *not* widen them. ;] > > > > From a memory model perspective, the frontend should emit the widest > > possible loads and stores. That tells the backend it has exactly that > much > > latitude. This change actually helps this in several cases by emitting > > *wider* loads and stores and thus clarifying that there cannot be a race > > across that memory. > > We also want to know the optimizer will narrow these when necessary > (current code also has that problem, but to a less severe degree). If > you are interested, here is a test case we don't currently > subsequently narrow: > -- > #pragma pack(1) > struct s0 { > unsigned a : 9; > unsigned b0 : 5; > unsigned b1 : 5; > unsigned b2 : 5; > unsigned b3 : 5; > unsigned b4 : 5; > unsigned b5 : 5; > unsigned b6 : 5; > unsigned b7 : 5; > unsigned b8 : 5; > unsigned b9 : 5; > unsigned b10 : 5; > unsigned b11 : 5; > char c; > }; > > unsigned f0(struct s0 *p) { > return p->b8; > } > -- > which leaves us with a 64-bit load. Yea, this ends up with the same 64-bit load. Is that really bad? I suppose it might be nice to teach the backend to narrow to a single byte load where we can if only for the encoding shrink. I'll add this as a test case if you like, but it'll have to be a split test case, one in clang to produce the giant integer operation, and one in llvm to test how it lowers it.
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
