On Wed, May 12, 2021 at 02:56:56PM +0200, Christophe Leroy wrote: > Le 11/05/2021 à 12:51, Segher Boessenkool a écrit : > >Something seems to have decided this asm is more expensive than it is. > >That isn't always avoidable -- the compiler cannot look inside asms -- > >but it seems it could be improved here. > > > >Do you have (or can make) a self-contained testcase? > > I have not tried, and I fear it might be difficult, because on a kernel > build with dozens of calls to csum_add(), only ip6_tunnel.o exhibits such > an issue.
Yeah. Sometimes you can force some of the decisions, but that usually requires knowing too many GCC internals :-/ > >>And there is even one completely unused instance of csum_add(). > > > >That is strange, that should never happen. > > It seems that several .o include unused versions of csum_add. After the > final link, one remains (in addition to the used one) in vmlinux. But it is a static function, so it should not end up in any object file where it isn't used. > >>In the non-inlined version, the first sum with 0 was performed. > >>Here it is skipped. > > > >That is because of how __builtin_constant_p works, most likely. As we > >discussed elsewhere it is evaluated before all forms of loop unrolling. > > But we are not talking about loop unrolling here, are we ? Oh, right you are, but that doesn't change much. The _builtin_constant_p(len) is evaluated long before the compiler sees len is a constant here. > It seems that the reason here is that __builtin_constant_p() is evaluated > long after GCC decided to not inline that call to csum_add(). Yes, it seems we do not currently do even trivial inlining except very early in the compiler. Thanks, Segher