On Tue, May 11, 2021 at 5:01 PM Segher Boessenkool
<seg...@kernel.crashing.org> wrote:
>
> On Tue, May 04, 2021 at 10:40:38AM +0200, Richard Biener via Gcc wrote:
> > On Mon, May 3, 2021 at 11:10 PM Andrew Pinski via Gcc <gcc@gcc.gnu.org> 
> > wrote:
> > >   I noticed my (highly, -j24) parallel build of GCC is serialized on
> > > compiling gimple-match.c.  Has anyone looked into splitting this
> > > generated file into multiple files?
> >
> > There were threads about this in the past, yes.  There's the
> > possibility to use LTO for this as well (also mentioned in those
> > threads).  Note it's not easy to split in a meaningful way in
> > genmatch.c
>
> But it will have to be handled at least somewhat soon: on not huge
> parallelism (-j120 for example) building *-match.c takes longer than
> building everything else in gcc/ together (wallclock time), and it is a
> huge part of regstrap time (bigger than running all of the testsuite!)

I would classify -j120 as "huge parallelism" ;)  Testing time still
dominates my builds (with -j24) where bootstrap takes ~20 mins
but testing another 40.

Is it building stage2 gimple-match.c that you are worried about?
(it's built using the -O0 compiled stage1 compiler - but we at
least should end up using -fno-checking for this build)

Maybe you can do some experiments - like add
-fno-inline-functions-called-once and change
genmatch.c:3766 to split out single uses as well
(should decrease function sizes).

There's the option to make all functions external in
gimple-match.c so splitting the file at arbitrary points
will be possible (directly from genmatch), we'll need
some internal header with all declarations then
as well or alternatively some clever logic in
genmatch to only externalize functions needed from
mutliple split files.

That said - ideas to reduce the size of the generated
code are welcome as well.

There's also pattern ordering in match.pd that can
make a difference because we're honoring
first-match and thus have to re-start matching from
outermost on conflicts (most of the time the actual
oder in match.pd is just random).  If you add -v
to genmatch then you'll see

/home/rguenther/src/gcc3/gcc/match.pd:6092:10 warning: failed to merge
decision tree node
   (cmp (op@3 @0 INTEGER_CST@1) INTEGER_CST@2)
         ^
/home/rguenther/src/gcc3/gcc/match.pd:4263:11 warning: with the following
    (cmp (op @0 REAL_CST@1) REAL_CST@2)
          ^
/home/rguenther/src/gcc3/gcc/match.pd:5164:6 warning: because of the
following which serves as ordering barrier
 (eq @0 integer_onep)
     ^

that means that the simple (eq @0 integer_onep) should match after
4263 but before 6092
(only the latter will actually match the same - the former has
REAL_CST@2 but 5164
uses a predicate integer_onep).  This causes us to emit three switch
(code){ case EQ_EXPR: }
instead of one.

There might be legitimate cases of such order constraints but most of them
are spurious.  "Fixing" them will also make the matching process faster, but
it's quite some legwork where moving a pattern can fix one occurance but
result in new others.

For me building stage3 gimple-match.o (on a fully loaded system.. :/) is

95.05user 0.42system 1:35.51elapsed 99%CPU (0avgtext+0avgdata
929400maxresident)k
0inputs+0outputs (0major+393349minor)pagefaults 0swaps

and when I use -Wno-error -flto=24 -flinker-output=nolto-rel -r

139.95user 1.79system 0:25.92elapsed 546%CPU (0avgtext+0avgdata
538852maxresident)k
0inputs+0outputs (0major+1139679minor)pagefaults 0swaps

the issue of course is that we can't use this for the stage1 build
(unless we detect working
GCC LTO in the host compiler setup).  I suppose those measures show the lower
bound of what should be possible with splitting up the file (LTO
splits to 128 pieces),
so for me it's a 4x speedup in wallclock time despite the overhead of
LTO which is
quite noticable.  -fno-checking also makes a dramatic difference for me.

Richard.

>
> Segher

Reply via email to