> On Aug 29, 2019, at 11:15 AM, Linus Torvalds <[email protected]> > wrote: > > On Thu, Aug 29, 2019 at 10:36 AM Nick Desaulniers > <[email protected]> wrote: >> I'm curious what "the size of the asm" means, and how it differs >> precisely from "how many instructions GCC thinks it is." I would >> think those are one and the same? Or maybe "the size of the asm" >> means the size in bytes when assembled to machine code, as opposed to >> the count of assembly instructions? > > The problem is that we do different sections in the inline asm, and > the instruction counts are completely bogus as a result. > > The actual instruction in the code stream may be just a single > instruction. But the out-of-line sections can be multiple instructions > and/or a data section that contains exception information. > > So we want the asm inlined, because the _inline_ part (and the hot > instruction) is small, even though the asm technically maybe generates > many more bytes of additional data. > > The worst offenders for this tend to be > > - various exception tables for user accesses etc > > - "alternatives" where we list two or more different asm alternatives > and then pick the right one at boot time depending on CPU ID flags > > - "BUG_ON()" instructions where there's a "ud2" instruction and > various data annotations going with it > > so gcc may be "technically correct" that the inline asm statement > contains ten instructions or more, but the actual instruction _code_ > footprint in the asm is likely just a single instruction or two. > > The statement counting is also completely off by the fact that some of > the "statements" are assembler directives (ie the > ".pushsection"/".popsection" lines etc). So some of it is that the > instruction counting is off, but the largest part is that it's just > not relevant to the code footprint in that function. > > Un-inlining a function because it contains a single inline asm > instruction is not productive. Yes, it might result in a smaller > binary over-all (because all those other non-code sections do take up > some space), but it actually results in a bigger code footprint.
For the record, here is my failing attempt to address the issue without GCC support: https://lore.kernel.org/lkml/[email protected]/T/

