https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66240

            Bug ID: 66240
           Summary: RFE: extend -falign-xyz syntax
           Product: gcc
           Version: 5.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vda.linux at googlemail dot com
  Target Milestone: ---

Experimentally, compilation with
-O2 -falign-functions=17 -falign-loops=17 -falign-jumps=17 -falign-labels=17
results in the following:
- functions are aligned using ".p2align 5,,16" asm directive
- loops/jumps/labels are aligned using ".p2align 5"

-Os -falign-functions=17 -falign-loops=17 -falign-jumps=17 -falign-labels=17
results in the following:
- functions are not aligned
- loops/jumps/labels are aligned using ".p2align 5"

Can this be improved so that in all cases, ".p2align 5,,16" is used? Shouldn't
be that hard...


Next step (what this RFE is all about). -falign-functions=N is too simplistic.
Ingo Molnar ran some tests and it looks on latest x86 CPUs, 64-byte alignment
runs fastest (he tried many other possibilites).

However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
On 05/20/2015 02:47 AM, Linus Torvalds wrote:
> At the same time, I have to admit that I abhor a 64-byte function
> alignment, when we have a fair number of functions that are (much)
> smaller than that.
> 
> Is there some way to get gcc to take the size of the function into
> account? Because aligning a 16-byte or 32-byte function on a 64-byte
> alignment is just criminally nasty and wasteful.

I propose the following: align function to 64-byte boundaries *IF* this does
not introduce huge amount of padding.

GNU as already has support for this:

.align N1,FILL,N3

"The third expression is also absolute, and is also optional.
If it is present, it is the maximum number of bytes that should
be skipped by this alignment directive."

So, what we want is to put something like ".align 64,,7"
before every function. 98% of functions in typical linux kernel have first
instruction 7 or fewer bytes long. Thus, with ".align 64,,7", calling any
function will at a minimum be able to fetch one insn in one L1 read, not two.
And this would be acheved with only ~3.5 bytes per function wasted to padding
on average, whereas ".align 64" would waste 31 byte on average.

Please extend -falign-foo=N syntax to, say, -falign-foo=N,M, which generates
".align M,,N-1" or equivalent.

Reply via email to