[clang] [clang] Better bitfield access units (PR #65742)

Nathan Sidwell via cfe-commits Mon, 11 Sep 2023 04:27:42 -0700

urnathan wrote:

> The advantage of exposing the wide accesses to the optimizer is that it 
> allows memory optimizations, like CSE or DSE, to reason about the entire 
> bitfield as a single unit, and easily eliminate accesses. Reducing the size 
> of bitfield accesses in the frontend is basically throwing away information.


Hm, I've been thinking of it the opposite way round -- merging bitfields is 
throwing away information (about where cuts might be).  And it's unclear to me 
how CSE or DSE could make use of a merged access unit to eliminate accesses -- 
it would seem to me that a merged access unit accessed at a single point would 
make it look like the whole unit was live? (Can you point me at an example of 
the analysis you describe happening?)

That the simple x86 example I showed doesn't show (complete) undoing of the 
merging suggests it is hard for CSE and DSE to do the analysis you indicate. 
DSE did work there, to undo the merge, but there's no dead load elimination 
happening. But, that DSE is merely undoing the gluing that the front end did -- 
if we didn't glue, then it would always happen.

> The disadvantage is, as you note here, that sometimes codegen doesn't manage 
> to clean up the resulting code well.

My contention is that the current algorithm both (a) fails to merge some 
mergeable access units and (b) inappropriately merges some access units 
*especially* on strict alignment machines.

> I guess my primary question here is, instead of making clang try to guess 
> which accesses will be optimal, can we improve the way the LLVM code 
> generator handles the patterns currently generated by clang? I'm not exactly 
> against changing the IR generated by clang, but changing the IR like this 
> seems likely to improve some cases at the expense of others.

In the testcases that were changed I did not encounter one that generated worse 
code (an ARM testcase showed better code due to, IIRC, not merging more than a 
register width, as with the x86 case it didn't eliminate unnecessary loads 
whereas with this those are gone). I would be very interested in seeing cases 
that degrade though.


https://github.com/llvm/llvm-project/pull/65742
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Better bitfield access units (PR #65742)

Reply via email to