Hello Jeff and Richard:
Here is the Summary of the FDO(Feedback Directed Optimization ) performance
results.
SPEC CPU2000 INT benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
Geomean Score = 3907.751673.
b) FDO + No Splitting Paths + tracer enabled
Geomean Score = 3895.191536.
SPEC CPU2000 FP benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
Geomean Score = 4793.321963
b) FDO + No Splitting Paths + tracer enabled
Geomean Score = 4770.855467
The gains are maximum with Split Paths enabled + tracer pass enabled as
compared to No Split Paths + tracer enabled. The
Split Paths pass is very much required.
Thanks & Regards
Ajit
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, December 16, 2015 3:44 PM
To: Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli
Hunsigida; Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa
representation
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Richard Biener
Sent: Wednesday, December 16, 2015 3:27 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa
representation
On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal
<[email protected]> wrote:
> Hello Jeff:
>
> Here is more of a data you have asked for.
>
> SPEC FP benchmarks.
> a) No Path Splitting + tracer enabled
> Geomean Score = 4749.726.
> b) Path Splitting enabled + tracer enabled.
> Geomean Score = 4781.655.
>
> Conclusion: With both Path Splitting and tracer enabled we got maximum gains.
> I think we need to have Path Splitting pass.
>
> SPEC INT benchmarks.
> a) Path Splitting enabled + tracer not enabled.
> Geomean Score = 3745.193.
> b) No Path Splitting + tracer enabled.
> Geomean Score = 3738.558.
> c) Path Splitting enabled + tracer enabled.
> Geomean Score = 3742.833.
>>I suppose with SPEC you mean SPEC CPU 2006?
The performance data is with respect to SPEC CPU 2000 benchmarks.
>>Can you disclose the architecture you did the measurements on and the compile
>>flags you used otherwise?
Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz
cpu cores : 10
cache size : 25600 KB
I have used -O3 and enable the tracer with -ftracer .
Thanks & Regards
Ajit
>>Note that tracer does a very good job only when paired with FDO so can you
>>re-run SPEC with FDO and compare with path-splitting enabled on top of that?
Thanks,
Richard.
> Conclusion: We are getting more gains with Path Splitting as compared to
> tracer. With both Path Splitting and tracer enabled we are also getting
> gains.
> I think we should have Path Splitting pass.
>
> One more observation: Richard's concern is the creation of multiple
> exits with Splitting paths through duplication. My observation is, in
> tracer pass also there is a creation of multiple exits through duplication. I
> don’t think that’s an issue with the practicality considering the gains we
> are getting with Splitting paths with more PRE, CSE and DCE.
>
> Thanks & Regards
> Ajit
>
>
>
>
> -----Original Message-----
> From: Jeff Law [mailto:[email protected]]
> Sent: Wednesday, December 16, 2015 5:20 AM
> To: Richard Biener
> Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya
> Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on
> tree ssa representation
>
> On 12/11/2015 03:05 AM, Richard Biener wrote:
>> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <[email protected]> wrote:
>>> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>>>
>>>> This pass is now enabled by default with -Os but has no limits on
>>>> the amount of stmts it copies.
>>>
>>> The more statements it copies, the more likely it is that the path
>>> spitting will turn out to be useful! It's counter-intuitive.
>>
>> Well, it's still not appropriate for -Os (nor -O2 I think). -ftracer
>> is enabled with -fprofile-use (but it is also properly driven to only
>> trace hot paths) and otherwise not by default at any optimization level.
> Definitely not appropriate for -Os. But as I mentioned, I really want to
> look at the tracer code as it may totally subsume path splitting.
>
>>
>> Don't see how this would work for the CFG pattern it operates on
>> unless you duplicate the exit condition into that new block creating
>> an even more obfuscated CFG.
> Agreed, I don't see any way to fix the multiple exit problem. Then again,
> this all runs after the tree loop optimizer, so I'm not sure how big of an
> issue it is in practice.
>
>
>>> It was only after I approved this code after twiddling it for Ajit
>>> that I came across Honza's tracer implementation, which may in fact
>>> be retargettable to these loops and do a better job. I haven't
>>> experimented with that.
>>
>> Well, I originally suggested to merge this with the tracer pass...
> I missed that, or it didn't sink into my brain.
>
>>> Again, the more statements it copies the more likely it is to be profitable.
>>> Think superblocks to expose CSE, DCE and the like.
>>
>> Ok, so similar to tracer (where I think the main benefit is actually
>> increasing scheduling opportunities for architectures where it matters).
> Right. They're both building superblocks, which has the effect of larger
> windows for scheduling, DCE, CSE, etc.
>
>
>>
>> Note that both passes are placed quite late and thus won't see much
>> of the GIMPLE optimizations (DOM mainly). I wonder why they were not
>> placed adjacent to each other.
> Ajit had it fairly early, but that didn't play well with if-conversion.
> I just pushed it past if-conversion and vectorization, but before
> the last DOM pass. That turns out to be where tracer lives too as you noted.
>
>>>
>>> I wouldn't lose any sleep if we disabled by default or removed,
>>> particularly if we can repurpose Honza's code. In fact, I might
>>> strongly support the former until we hear back from Ajit on performance
>>> data.
>>
>> See above for what we do with -ftracer. path-splitting should at
>> _least_ restrict itself to operate on optimize_loop_for_speed_p () loops.
> I think we need to decide if we want the code at all, particularly
> given the multiple-exit problem.
>
> The difficulty is I think Ajit posted some recent data that shows it's
> helping. So maybe the thing to do is ask Ajit to try the tracer
> independent of path splitting and take the obvious actions based on
> Ajit's data.
>
>
>>
>> It should also (even if counter-intuitive) limit the amount of stmt
>> copying it does - after all there is sth like an instruction cache
>> size which exceeeding for loops will never be a good idea (and even
>> smaller special loop caches on some archs).
> Yup.
>
>>
>> Note that a better heuristic than "at least more than one stmt" would
>> be to have at least one PHI in the merger block. Otherwise I don't
>> see how CSE opportunities could exist we don't see without the duplication.
>> And yes, more PHIs -> more possible CSE. I wouldn't say so for the
>> number of stmts. So please limit the number of stmt copies!
>> (after all we do limit the number of stmts we copy during jump
>> threading!)
> Let's get some more data before we try to tune path splitting. In an
> ideal world, the tracer can handle this for us and we just remove path
> splitting completely.
>
> Jeff