On Wed, Nov 28, 2012 at 4:49 PM, Eli Friedman <[email protected]> wrote: > On Wed, Nov 28, 2012 at 4:36 PM, Chandler Carruth <[email protected]> > wrote: >> I don't have 176.gcc handy, but I looked at the binary sizes for every >> program in the LNT test suite. Here are the benchmarks which changed >> by at least 1%. First column is the benchmark, then the old size of >> the '.text' section, then the new size, and finally "(new - old) / >> old" to show the % change. >> >> MultiSource/Benchmarks/Fhourstones-3.1/Output/fhourstones3.1.simple, >> 4872, 4984, 0.022988505747126436 > > The actual numbers don't matter, but it's an interesting proxy for our > handling of the following struct from that benchmark: > > typedef struct { > unsigned biglock:LOCKSIZE; > unsigned bigwork:6; > unsigned newlock:LOCKSIZE; > unsigned newscore:3; > unsigned bigscore:3; > } hashentry; > > The code-size increase probably indicates we're not doing a good job > of narrowing the large loads generated by the new bitfield code.
I'm not sure what constitutes a "good job" here... The primary difference is because this is a 64-bit machine, and thus codegen only tries to narrow the loads and stores to 64-bit loads and stores. The old bitfield code unconditionally split the loads and stores into 32-bit chunks. This particular bitfield collection is exactly 64-bits wide, and so I would expect loading and storing 64-bits at a time to be a good thing on the whole, as it should be able to do fewer loads and stores and instead perform bit-wise arithmetic to extract the values.... Notably, there are no "strange sized" loads or stores that Chris was worried about, these are all nice, target-legal, 64-bit integer loads and stores. Looking at the code in this benchmark, it goes both ways depending on the circumstance: there are some places where doing 64-bit wide operations generates better code, other places where it generates worse code (both in terms of size and expected performance). >From what I can tell, the codesize issues here are just a result of the backend not being clever enough to split wide operations when doing so makes the code significantly smaller, for example by removing the need for very immediate masks. We should improve the backend for these types of inputs, and then we'll be able to have the best of both worlds -- wide stores when preferable, and narrow when that simplifies something. We can't generally achieve this unless the frontend starts off by using the wide loads and stores. _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
