Re: backward threading heuristics tweek

James Greenhalgh Fri, 12 Aug 2016 08:07:55 -0700

On Thu, Aug 11, 2016 at 01:35:16PM +0200, Jan Hubicka wrote:
> > On Mon, Jun 6, 2016 at 3:19 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
> > > Hi,
> > > while looking into profile mismatches introduced by the backward 
> > > threading pass
> > > I noticed that the heuristics seems quite simplistics.  First it should be
> > > profile sensitive and disallow duplication when optimizing cold paths. 
> > > Second
> > > it should use estimate_num_insns because gimple statement count is not 
> > > really
> > > very realistic estimate of final code size effect and third there seems 
> > > to be
> > > no reason to disable the pass for functions optimized for size.
> > >
> > > If we block duplication for more than 1 insns for size optimized paths 
> > > the pass
> > > is able to do majority of threading decisions that are for free and 
> > > improve codegen.
> > > The code size benefit was between 0.5% to 2.7% on testcases I tried 
> > > (tramp3d,
> > > GCC modules, xlanancbmk and some other stuff around my hd).
> > >
> > > Bootstrapped/regtested x86_64-linux, seems sane?
> > >
> > > The pass should also avoid calling cleanup_cfg when no trheading was done
> > > and i do not see why it is guarded by expensive_optimizations. What are 
> > > the
> > > main compile time complexity limitations?
> > 
> > This patch caused a huge regression (~11%) on coremarks on ThunderX.
> > I assume other targets too.
> > Basically it looks like the path is no longer thread jumped.
> 
> Sorry for late reply. I checked our periodic testers and the patch seems more
> or less performance neutral with some code size improvements. Can you point
> me to the path that is no longer crossjumped? I added diag output, so you
> should see the reason why the path was considered unprofitable - either it
> was cold or we exceeded the maximal size.  The size is largely untuned, so
> perhaps we can just adjust it.


Yes, it is going to be the "cold" edge heuristic that does it.

FSM jump-thread path not considered: duplication of 18 insns is needed and 
optimizing for size.

For whatever reason, in the case of the edges in this benchmark, 
if I dump EDGE_FREQUENCY for the rejected edges in thread2 that we used to
accept they have frequency 0 (!).

This happens because the source block of the edge (the controlling switch)
has frequency 0:

  ;;   basic block 7, loop depth 3, count 0, freq 0
  ;;    prev block 6, next block 8, flags: (NEW, REACHABLE)
  ;;    pred:       5 [95.0%]  (FALSE_VALUE,EXECUTABLE)
    switch (state_504) <default: <L31>, case 0: <L24>, case 2: <L25>, case 3: 
<L28>, case 4: <L26>, case 5: <L27>, case 6: <L29>, case 7: <L30>>
  ;;    succ:       36 [12.5%]  (EXECUTABLE)
  ;;                8 [12.5%]  (EXECUTABLE)
  ;;                18 [12.5%]  (EXECUTABLE)
  ;;                30 [12.5%]  (EXECUTABLE)
  ;;                22 [12.5%]  (EXECUTABLE)
  ;;                26 [12.5%]  (EXECUTABLE)
  ;;                32 [12.5%]  (EXECUTABLE)
  ;;                34 [12.5%]  (EXECUTABLE)

It looks like it is thread1 that creates these blocks with empty
frequency information.

I haven't managed to find a reduced testcase that shows this issue.

Thanks,
James

Re: backward threading heuristics tweek

Reply via email to