On Thu, May 21, 2020 at 10:21 PM 'Nick Desaulniers' via Clang Built Linux <[email protected]> wrote: > > On Thu, May 21, 2020 at 7:22 AM 'Marco Elver' via Clang Built Linux > <[email protected]> wrote: > > > > It appears that compilers have trouble with nested statement > > expressions. Therefore remove one level of statement expression nesting > > from the data_race() macro. This will help us avoid potential problems > > in future as its usage increases. > > > > Link: https://lkml.kernel.org/r/[email protected] > > Acked-by: Will Deacon <[email protected]> > > Signed-off-by: Marco Elver <[email protected]> > > Thanks Marco, I can confirm this series fixes the significant build > time regressions. > > Tested-by: Nick Desaulniers <[email protected]> > > More measurements in: https://github.com/ClangBuiltLinux/linux/issues/1032 > > Might want: > Reported-by: Borislav Petkov <[email protected]> > Reported-by: Nathan Chancellor <[email protected]> > too.
I find this patch only solves half the problem: it's much faster than without the patch, but still much slower than the current mainline version. As far as I'm concerned, I think the build speed regression compared to mainline is not yet acceptable, and we should try harder. I have not looked too deeply at it yet, but this is what I found from looking at a file in a randconfig build: Configuration: see https://pastebin.com/raw/R9erCwNj == Current linux-next == with "data_race: Avoid nested statement expression" and "compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()" $ touch fs/ocfs2/journal.c ; cp ../arch/x86/configs/0xFFA843AA_defconfig obj-x86/.config ; perf stat make olddefconfig O=obj-x86/ CC=clang-11 fs/ocfs2/journal.i ARCH=x86 -skj30 ; wc obj-x86/fs/ocfs2/journal.i 48741 552950 9010050 obj-x86/fs/ocfs2/journal.i real 0m12.514s user 0m10.270s sys 0m2.668s == Same tree, without those two == $ touch fs/ocfs2/journal.c cp ../arch/x86/configs/0xFFA843AA_defconfig obj-x86/.config ; time make olddefconfig O=obj-x86/ CC=clang-11 fs/ocfs2/journal.i ARCH=x86 -skj30 ; wc obj-x86/fs/ocfs2/journal.i real 1m35.968s user 1m33.579s sys 0m3.523s 48741 1926607 36542560 obj-x86/fs/ocfs2/journal.i == Mainline Linux == $ touch fs/ocfs2/journal.c ; cp ../arch/x86/configs/0xFFA843AA_defconfig obj-x86/.config ; time make olddefconfig O=obj-x86/ CC=clang-11 fs/ocfs2/journal.i ARCH=x86 -skj30 ; wc obj-x86/fs/ocfs2/journal.i real 0m6.529s user 0m4.389s sys 0m2.561s 47377 377887 4178633 obj-x86/fs/ocfs2/journal.i So both the size of the preprocessed file and the time to preprocess it are still twice as bad for linux-next compared to mainline. Actually compiling the preprocessed filed is very quick, as I guess only the preprocessing seems to use all the time. Arnd

