Hello Jeff and Richard:

Here is the Summary of the FDO(Feedback Directed Optimization ) performance 
results.

SPEC CPU2000 INT benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
     Geomean Score = 3907.751673.
b) FDO + No Splitting Paths + tracer enabled
     Geomean Score = 3895.191536.

SPEC CPU2000 FP benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
     Geomean Score = 4793.321963
b) FDO + No Splitting Paths + tracer enabled
     Geomean Score = 4770.855467

The gains are maximum with Split Paths enabled + tracer pass enabled as 
compared to No Split Paths + tracer enabled. The 
Split Paths pass is very much required.

Thanks & Regards
Ajit

-----Original Message-----
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, December 16, 2015 3:44 PM
To: Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation



-----Original Message-----
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Biener
Sent: Wednesday, December 16, 2015 3:27 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal 
<ajit.kumar.agar...@xilinx.com> wrote:
> Hello Jeff:
>
> Here is more of a data you have asked for.
>
> SPEC FP benchmarks.
> a) No Path Splitting + tracer enabled
>     Geomean Score =  4749.726.
> b) Path Splitting enabled + tracer enabled.
>     Geomean Score =  4781.655.
>
> Conclusion: With both Path Splitting and tracer enabled we got maximum gains. 
> I think we need to have Path Splitting pass.
>
> SPEC INT benchmarks.
> a) Path Splitting enabled + tracer not enabled.
>     Geomean Score =  3745.193.
> b) No Path Splitting + tracer enabled.
>     Geomean Score = 3738.558.
> c) Path Splitting enabled + tracer enabled.
>     Geomean Score = 3742.833.

>>I suppose with SPEC you mean SPEC CPU 2006?

The performance data is with respect to SPEC CPU 2000 benchmarks.

>>Can you disclose the architecture you did the measurements on and the compile 
>>flags you used otherwise?

Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 
cpu cores       : 10
cache size      : 25600 KB

I have used -O3 and enable the tracer with  -ftracer .

Thanks & Regards
Ajit
>>Note that tracer does a very good job only when paired with FDO so can you 
>>re-run SPEC with FDO and compare with path-splitting enabled on top of that?


Thanks,
Richard.

> Conclusion: We are getting more gains with Path Splitting as compared to 
> tracer. With both Path Splitting and tracer enabled we are also getting  
> gains.
> I think we should have Path Splitting pass.
>
> One more observation: Richard's concern is the creation of multiple 
> exits with Splitting paths through duplication. My observation is,  in 
> tracer pass also there is a creation of multiple exits through duplication. I 
> don’t think that’s an issue with the practicality considering the gains we 
> are getting with Splitting paths with more PRE, CSE and DCE.
>
> Thanks & Regards
> Ajit
>
>
>
>
> -----Original Message-----
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, December 16, 2015 5:20 AM
> To: Richard Biener
> Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya 
> Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
> tree ssa representation
>
> On 12/11/2015 03:05 AM, Richard Biener wrote:
>> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote:
>>> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>>>
>>>> This pass is now enabled by default with -Os but has no limits on 
>>>> the amount of stmts it copies.
>>>
>>> The more statements it copies, the more likely it is that the path 
>>> spitting will turn out to be useful!  It's counter-intuitive.
>>
>> Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer 
>> is enabled with -fprofile-use (but it is also properly driven to only 
>> trace hot paths) and otherwise not by default at any optimization level.
> Definitely not appropriate for -Os.  But as I mentioned, I really want to 
> look at the tracer code as it may totally subsume path splitting.
>
>>
>> Don't see how this would work for the CFG pattern it operates on 
>> unless you duplicate the exit condition into that new block creating 
>> an even more obfuscated CFG.
> Agreed, I don't see any way to fix the multiple exit problem.  Then again, 
> this all runs after the tree loop optimizer, so I'm not sure how big of an 
> issue it is in practice.
>
>
>>> It was only after I approved this code after twiddling it for Ajit 
>>> that I came across Honza's tracer implementation, which may in fact 
>>> be retargettable to these loops and do a better job.  I haven't 
>>> experimented with that.
>>
>> Well, I originally suggested to merge this with the tracer pass...
> I missed that, or it didn't sink into my brain.
>
>>> Again, the more statements it copies the more likely it is to be profitable.
>>> Think superblocks to expose CSE, DCE and the like.
>>
>> Ok, so similar to tracer (where I think the main benefit is actually 
>> increasing scheduling opportunities for architectures where it matters).
> Right.  They're both building superblocks, which has the effect of larger 
> windows for scheduling, DCE, CSE, etc.
>
>
>>
>> Note that both passes are placed quite late and thus won't see much 
>> of the GIMPLE optimizations (DOM mainly).  I wonder why they were not 
>> placed adjacent to each other.
> Ajit had it fairly early, but that didn't play well with if-conversion.
>   I just pushed it past if-conversion and vectorization, but before 
> the last DOM pass.  That turns out to be where tracer lives too as you noted.
>
>>>
>>> I wouldn't lose any sleep if we disabled by default or removed, 
>>> particularly if we can repurpose Honza's code.  In fact, I might 
>>> strongly support the former until we hear back from Ajit on performance 
>>> data.
>>
>> See above for what we do with -ftracer.  path-splitting should at 
>> _least_ restrict itself to operate on optimize_loop_for_speed_p () loops.
> I think we need to decide if we want the code at all, particularly 
> given the multiple-exit problem.
>
> The difficulty is I think Ajit posted some recent data that shows it's 
> helping.  So maybe the thing to do is ask Ajit to try the tracer 
> independent of path splitting and take the obvious actions based on 
> Ajit's data.
>
>
>>
>> It should also (even if counter-intuitive) limit the amount of stmt 
>> copying it does - after all there is sth like an instruction cache 
>> size which exceeeding for loops will never be a good idea (and even 
>> smaller special loop caches on some archs).
> Yup.
>
>>
>> Note that a better heuristic than "at least more than one stmt" would 
>> be to have at least one PHI in the merger block.  Otherwise I don't 
>> see how CSE opportunities could exist we don't see without the duplication.
>> And yes, more PHIs -> more possible CSE.  I wouldn't say so for the 
>> number of stmts.  So please limit the number of stmt copies!
>> (after all we do limit the number of stmts we copy during jump
>> threading!)
> Let's get some more data before we try to tune path splitting.  In an 
> ideal world, the tracer can handle this for us and we just remove path 
> splitting completely.
>
> Jeff

Reply via email to