On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal
<ajit.kumar.agar...@xilinx.com> wrote:
> Hello Jeff:
>
> Here is more of a data you have asked for.
>
> SPEC FP benchmarks.
> a) No Path Splitting + tracer enabled
>     Geomean Score =  4749.726.
> b) Path Splitting enabled + tracer enabled.
>     Geomean Score =  4781.655.
>
> Conclusion: With both Path Splitting and tracer enabled we got maximum gains. 
> I think we need to have Path Splitting pass.
>
> SPEC INT benchmarks.
> a) Path Splitting enabled + tracer not enabled.
>     Geomean Score =  3745.193.
> b) No Path Splitting + tracer enabled.
>     Geomean Score = 3738.558.
> c) Path Splitting enabled + tracer enabled.
>     Geomean Score = 3742.833.

I suppose with SPEC you mean SPEC CPU 2006?

Can you disclose the architecture you did the measurements on and the
compile flags you used otherwise?

Note that tracer does a very good job only when paired with FDO so can
you re-run SPEC with FDO and
compare with path-splitting enabled on top of that?

Thanks,
Richard.

> Conclusion: We are getting more gains with Path Splitting as compared to 
> tracer. With both Path Splitting and tracer enabled we are also getting  
> gains.
> I think we should have Path Splitting pass.
>
> One more observation: Richard's concern is the creation of multiple exits 
> with Splitting paths through duplication. My observation is,  in tracer pass 
> also there
> is a creation of multiple exits through duplication. I don’t think that’s an 
> issue with the practicality considering the gains we are getting with 
> Splitting paths with
> more PRE, CSE and DCE.
>
> Thanks & Regards
> Ajit
>
>
>
>
> -----Original Message-----
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, December 16, 2015 5:20 AM
> To: Richard Biener
> Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
> Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree 
> ssa representation
>
> On 12/11/2015 03:05 AM, Richard Biener wrote:
>> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote:
>>> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>>>
>>>> This pass is now enabled by default with -Os but has no limits on
>>>> the amount of stmts it copies.
>>>
>>> The more statements it copies, the more likely it is that the path
>>> spitting will turn out to be useful!  It's counter-intuitive.
>>
>> Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer
>> is enabled with -fprofile-use (but it is also properly driven to only
>> trace hot paths) and otherwise not by default at any optimization level.
> Definitely not appropriate for -Os.  But as I mentioned, I really want to 
> look at the tracer code as it may totally subsume path splitting.
>
>>
>> Don't see how this would work for the CFG pattern it operates on
>> unless you duplicate the exit condition into that new block creating
>> an even more obfuscated CFG.
> Agreed, I don't see any way to fix the multiple exit problem.  Then again, 
> this all runs after the tree loop optimizer, so I'm not sure how big of an 
> issue it is in practice.
>
>
>>> It was only after I approved this code after twiddling it for Ajit
>>> that I came across Honza's tracer implementation, which may in fact
>>> be retargettable to these loops and do a better job.  I haven't
>>> experimented with that.
>>
>> Well, I originally suggested to merge this with the tracer pass...
> I missed that, or it didn't sink into my brain.
>
>>> Again, the more statements it copies the more likely it is to be profitable.
>>> Think superblocks to expose CSE, DCE and the like.
>>
>> Ok, so similar to tracer (where I think the main benefit is actually
>> increasing scheduling opportunities for architectures where it matters).
> Right.  They're both building superblocks, which has the effect of larger 
> windows for scheduling, DCE, CSE, etc.
>
>
>>
>> Note that both passes are placed quite late and thus won't see much
>> of the GIMPLE optimizations (DOM mainly).  I wonder why they were
>> not placed adjacent to each other.
> Ajit had it fairly early, but that didn't play well with if-conversion.
>   I just pushed it past if-conversion and vectorization, but before the
> last DOM pass.  That turns out to be where tracer lives too as you noted.
>
>>>
>>> I wouldn't lose any sleep if we disabled by default or removed, particularly
>>> if we can repurpose Honza's code.  In fact, I might strongly support the
>>> former until we hear back from Ajit on performance data.
>>
>> See above for what we do with -ftracer.  path-splitting should at _least_
>> restrict itself to operate on optimize_loop_for_speed_p () loops.
> I think we need to decide if we want the code at all, particularly given
> the multiple-exit problem.
>
> The difficulty is I think Ajit posted some recent data that shows it's
> helping.  So maybe the thing to do is ask Ajit to try the tracer
> independent of path splitting and take the obvious actions based on
> Ajit's data.
>
>
>>
>> It should also (even if counter-intuitive) limit the amount of stmt copying
>> it does - after all there is sth like an instruction cache size which 
>> exceeeding
>> for loops will never be a good idea (and even smaller special loop caches on
>> some archs).
> Yup.
>
>>
>> Note that a better heuristic than "at least more than one stmt" would be
>> to have at least one PHI in the merger block.  Otherwise I don't see how
>> CSE opportunities could exist we don't see without the duplication.
>> And yes, more PHIs -> more possible CSE.  I wouldn't say so for
>> the number of stmts.  So please limit the number of stmt copies!
>> (after all we do limit the number of stmts we copy during jump threading!)
> Let's get some more data before we try to tune path splitting.  In an
> ideal world, the tracer can handle this for us and we just remove path
> splitting completely.
>
> Jeff

Reply via email to