On Thu, 2010-04-29 at 11:02 +1000, Benjamin Herrenschmidt wrote: > > The option Alan added reduces the footprint to 3 instructions which can > > be noped out completely. The rest of the function does not rely on the first > > three instructions. No stack spill is forced either: > > > > # gcc -pg -mprofile-kernel > > >From a quick test it appears that this only works with -m64, not -m32. > Alan is that correct ? Any chance you can fix that in future gcc > versions ? > > Also should we implement support for both type of mcounts or just only > allow enabling of ftrace with gcc's that support this ?
Also, Anton noticed : > Cheers, > Ben. > > > 0000000000000000 <.foo>: > > 0: 7c 08 02 a6 mflr r0 > > 4: f8 01 00 10 std r0,16(r1) The std is not useful here. We can do it inside mcount. > > 8: 48 00 00 01 bl 8 <.foo+0x8> <--- call to mcount And I noticed: > > c: 7c 08 02 a6 mflr r0 I'm happy to guarantee that mcount does the above. > > 10: f8 01 00 10 std r0,16(r1) And maybe that one too. However I understand if it's easier not to change the prolog codegen (the 2 insn above) and just stick to adding a 2 or 3 instructions boilerplate at the top. Cheers, Ben. > > 14: f8 21 ff d1 stdu r1,-48(r1) > > 18: e9 22 00 00 ld r9,0(r2) > > 1c: e8 69 00 02 lwa r3,0(r9) > > 20: 38 21 00 30 addi r1,r1,48 > > 24: e8 01 00 10 ld r0,16(r1) > > 28: 7c 08 03 a6 mtlr r0 > > 2c: 4e 80 00 20 blr > > > > > > This mean we could support ftrace function trace with very little overhead. > > > > In fact if we are careful when switching to the new mcount ABI and don't > > rely on the store of r0, we could probably optimise this even further in a > > future gcc and remove the store completely. mcount would be 2 instructions: > > > > mflr r0 > > bl 8 <.foo+0x8> > > > > Anton > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev