On 16/08/19 at 17:22, Étienne Mollier wrote:
> Bonjour,
> 
> Woops, this sounds a bit like I might not have used a very clear
> wording.  If I were at your place, I would proceed so; but I
> don't have a Piledriver CPU to do actual testing on my side.
> I'm still stuck with an old K10, not to mention my laptop, which
> comes with an old regular Atom.  :)
> 
> I did try to replace the k8 option by amdfam10 though.  In the
> half hundred thousand lines of logs issued by the build, I get
> something like a dozen differences between k8 and k10.  There
> were a tremendous amount of warnings too, but some of the ones
> you encountered did not appear: the thing with the missing jump
> target for instance, nor the ANNOTATE_NOSPEC_ALTERNATIVE on the
> retpoline thing.  I am running Debian Sid, currently shipping
> with Gcc 9, so this is a difference to take in account though.
> Finally, building an upstream Linux 5.2 kernel instead of
> Buster's 4.19 does not show most of the warnings I encountered,
> as these are being fixed as they come, but probably not as well
> in LTS kernels.
> 
> Doing a third run with addition of the tuning options (-mtune)
> made almost no difference at all, except on the build number and
> the CRC hash.  It seems to me that the architecture specific
> (-march) option already applies the proper tuning, at least for
> my architecture.
> 
> My last manipulation consisted in building Linux upstream 5.2.9,
> released lately, with -march=amdfam10, and this one is running
> quite well so far:
> 
>       $ uname -rv
>       5.2.9-k10 #1 SMP PREEMPT Fri Aug 16 16:13:08 CEST 2019
> 
> But again, no messages worth mentioning during the compilation.
> 
> Do your warnings appear when your build targets k8?
> Or when building a generic x86_64 kernel?

Actually I run kernel built with "k8" option, it works fine, I got no
warning during the compilation.

Investigating deeper your tips about "amdfam10" I checked the gcc
options web page:
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
amdfam10 optimization was for Family 10 CPU but I have a Family 15h CPU
I notice that it also exists a "bdver1" for my CPU family so I wanted
give it a try and I compiled the kernel source with "bdver1" and
surprise I got no warning, all worked fine, :-) the command line I use
to compile is:

~/linux-source-4.19$ time make -s -j9 ; make -s -j9 modules

> Compilers may have good optimization routines to boost the speed
> of the code in several situations, but in other ones there are
> trade-offs to take between size and performance of the code.  I
> personally prefer smaller sized executables (-Os): they fit in
> less pages, so uses less CPU cache, and leave more room for my
> programs to get more of their own data in cache (or I might
> simply have spent too much time on suckless.org.  ;)

Do you remember which kernel CONFIG switch lets to do this optimization?

> 
> Activating CPU specific options is interesting on some
> particular use cases, but newer instruction often require
> setting up various bits in the CPU before use, which tends to
> inflate the resulting executable.  This may be interesting for
> scientific applications, or programs dealing with big data
> arrays in general.  In kernel mode however, the only case I can
> think of where CPU specific accelerators would be beneficial are
> disk ciphering and RAID arrays, for which I believe there is
> already some runtime detection of available instructions, even
> with the generic compiler options.

I have four disks in a RAID 5 software array configuration on my system,
they are managed by mdadm this is my /proc/mdstat file:

$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdb1[1] sdd1[3](S) sdc1[2]
      1953258496 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
[UUU]

unused devices: <none>

> 
> To be honest, I don't believe the performance gain to get from
> the compiler is tremendous here.  Figures from the author of the
> patch are there to tell us there is a gain indeed; but when you
> investigate in detail the percentage of performance brought by
> the tuning, it is only about 0.03% for the selected benchmark on
> median values.  See the "Data" section at the very end of the
> README, and do your own calculations:
> 
>       https://github.com/graysky2/kernel_gcc_patch/blob/master/README.md
> 
> The best you can do here is to do your own measures with your
> own pattern of usage.  If you are a developer, you can run timed
> builds of Linux, and see the time it takes.  If you are inclined
> toward image rendering speeds, there are a few demo-scenes out
> there where you might get a few figures such as the frame rate
> (careful, glxgears may get capped to 60Hz when some accelerators
> are in use, prefer fancier demos.  ;)
> 
> There is also this other thread dealing with kernel latency
> measures; you may find a few useful tools listed in this
> discussion:
> 
>       https://lists.debian.org/debian-user/2019/08/msg00851.html
> 
> Or just see how perform your usual programs, if there are
> visible improvements.
> 
> Have fun,  :)
> 
Yes I agree the optimization won't impact on performance in a way that
is perceptively by an human there are tweak more important in the kernel
such as CONFIG_HZ_1000=y
I always take measurement of the time employee by kernel compilation out
of curiosity.
Thanks again for the tips, best regards

-- 
Franco Martelli

Reply via email to