Hey Chris! We chatted briefly about this patch during the devmtg, and I wanted to follow up here about it. What follows is a very brief summary of our discussion as I recall it -- chime in if I missed anything important.
On Fri, Oct 5, 2012 at 5:09 PM, Chris Lattner <[email protected]> wrote: > Hi Chandler, > > Sorry for the delay, I still haven't had a chance to look at your patch. > That said, here are some high level comments based on my understanding of > it: > > 1) I'm not worried about "weird" integer sizes in registers variables. LLVM > is pretty good at promoting stuff to "native" integers, and if there are > bugs handling them, we want to know anyway. If this is a convenient way to > model arithmetic on an i56 bitfield, then that's cool with me. Cool, no disagreement here. > 2) I don't like the idea of "weird" integer sizes on loads and stores to > struct memory though. One of the major things that the current bitfield > design is intended to encourage is to allow CSE and DSE of accesses to > struct fields. It currently does this by widening out loads (too much!) but > if the widening is the problem, I'd rather see this expanded out into a > series of loads (e.g. a i8 load and an i16 load, instead of an i24 load). > The reason for this is that the optimizer won't be splitting up i24 loads - > codegen will, and exposing truth to the mid-level optimizer is the right > thing to do. This is I think the interesting question. We discussed quite a bit in person and I'm not going to try to repeat all of it here... I think my essential argument is that the way the memory model (both C++'s and LLVM's) works, it is specifically useful for the frontend to emit maximally widened and/or merged stores. This exposes the maximal width of store which will not introduce a data race. Without this information, the optimizer can't freely use natural width operations in many cases, and it becomes harder for the optimizer to combine loads and stores to unrelated parts of a bitfield because they may appear to be important to leave separate to avoid data races. One possibility which we specifically discussed was pre-splitting the wide load or store operations into "natural" integer widths for the target. This has (in my mind) two significant downsides: 1) LLVM will hoist some of these operations across basic blocks which will prevent the target from merging them when profitable (say the memory is aligned, and it can do a 128bit store) 2) It duplicates the logic to intelligently decompose a load or store of a wide integer between the frontend and the codegen layer in LLVM. This is particularly frustrating because the codegen layer will in many cases do a *better* job of decomposing things that the frontend will by taking advantage of the exact instructions available and the final alignment of the memory location. > 3) This is the sort of change that needs some numbers to back it up. > Performance results would be best, but a decent proxy for them in this case > would be to look at the code size of functions in 176.gcc or something else > that uses a bunch of bitfields. I think this is crucial. =] I've run the LNT test suite with and without this patch, and I get no regressions (or improvements) in performance. That said, my box is quite noisy. If I can submit the patch, the LNT build bots will give us much more stable data I suspect. I don't have 176.gcc handy, but I looked at the binary sizes for every program in the LNT test suite. Here are the benchmarks which changed by at least 1%. First column is the benchmark, then the old size of the '.text' section, then the new size, and finally "(new - old) / old" to show the % change. MultiSource/Benchmarks/Fhourstones-3.1/Output/fhourstones3.1.simple, 4872, 4984, 0.022988505747126436 SingleSource/UnitTests/Output/2009-04-16-BitfieldInitialization.simple, 952, 888, -0.067226890756302518 SingleSource/UnitTests/Output/ms_struct-bitfield-init-1.simple, 664, 680, 0.024096385542168676 SingleSource/UnitTests/Output/2006-01-23-UnionInit.simple, 904, 920, 0.017699115044247787 SingleSource/Regression/C++/Output/2011-03-28-Bitfield.simple, 568, 552, -0.028169014084507043 So, one benchmark grew by 2.2%, but its a tiny benchmark to start with, so this isn't terribly interesting. Of the bitfield-heavy single source tests, we have 1.7%, 2.4%, -2.8% and -6.7%. So the growth here doesn't seem worrisome at all. Any other benchmarking you'd like me to do to validate this? _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
