On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal <ajit.kumar.agar...@xilinx.com> wrote: > Hello Jeff: > > Here is more of a data you have asked for. > > SPEC FP benchmarks. > a) No Path Splitting + tracer enabled > Geomean Score = 4749.726. > b) Path Splitting enabled + tracer enabled. > Geomean Score = 4781.655. > > Conclusion: With both Path Splitting and tracer enabled we got maximum gains. > I think we need to have Path Splitting pass. > > SPEC INT benchmarks. > a) Path Splitting enabled + tracer not enabled. > Geomean Score = 3745.193. > b) No Path Splitting + tracer enabled. > Geomean Score = 3738.558. > c) Path Splitting enabled + tracer enabled. > Geomean Score = 3742.833.
I suppose with SPEC you mean SPEC CPU 2006? Can you disclose the architecture you did the measurements on and the compile flags you used otherwise? Note that tracer does a very good job only when paired with FDO so can you re-run SPEC with FDO and compare with path-splitting enabled on top of that? Thanks, Richard. > Conclusion: We are getting more gains with Path Splitting as compared to > tracer. With both Path Splitting and tracer enabled we are also getting > gains. > I think we should have Path Splitting pass. > > One more observation: Richard's concern is the creation of multiple exits > with Splitting paths through duplication. My observation is, in tracer pass > also there > is a creation of multiple exits through duplication. I don’t think that’s an > issue with the practicality considering the gains we are getting with > Splitting paths with > more PRE, CSE and DCE. > > Thanks & Regards > Ajit > > > > > -----Original Message----- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, December 16, 2015 5:20 AM > To: Richard Biener > Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; > Vidhumouli Hunsigida; Nagaraju Mekala > Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree > ssa representation > > On 12/11/2015 03:05 AM, Richard Biener wrote: >> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote: >>> On 12/03/2015 07:38 AM, Richard Biener wrote: >>>> >>>> This pass is now enabled by default with -Os but has no limits on >>>> the amount of stmts it copies. >>> >>> The more statements it copies, the more likely it is that the path >>> spitting will turn out to be useful! It's counter-intuitive. >> >> Well, it's still not appropriate for -Os (nor -O2 I think). -ftracer >> is enabled with -fprofile-use (but it is also properly driven to only >> trace hot paths) and otherwise not by default at any optimization level. > Definitely not appropriate for -Os. But as I mentioned, I really want to > look at the tracer code as it may totally subsume path splitting. > >> >> Don't see how this would work for the CFG pattern it operates on >> unless you duplicate the exit condition into that new block creating >> an even more obfuscated CFG. > Agreed, I don't see any way to fix the multiple exit problem. Then again, > this all runs after the tree loop optimizer, so I'm not sure how big of an > issue it is in practice. > > >>> It was only after I approved this code after twiddling it for Ajit >>> that I came across Honza's tracer implementation, which may in fact >>> be retargettable to these loops and do a better job. I haven't >>> experimented with that. >> >> Well, I originally suggested to merge this with the tracer pass... > I missed that, or it didn't sink into my brain. > >>> Again, the more statements it copies the more likely it is to be profitable. >>> Think superblocks to expose CSE, DCE and the like. >> >> Ok, so similar to tracer (where I think the main benefit is actually >> increasing scheduling opportunities for architectures where it matters). > Right. They're both building superblocks, which has the effect of larger > windows for scheduling, DCE, CSE, etc. > > >> >> Note that both passes are placed quite late and thus won't see much >> of the GIMPLE optimizations (DOM mainly). I wonder why they were >> not placed adjacent to each other. > Ajit had it fairly early, but that didn't play well with if-conversion. > I just pushed it past if-conversion and vectorization, but before the > last DOM pass. That turns out to be where tracer lives too as you noted. > >>> >>> I wouldn't lose any sleep if we disabled by default or removed, particularly >>> if we can repurpose Honza's code. In fact, I might strongly support the >>> former until we hear back from Ajit on performance data. >> >> See above for what we do with -ftracer. path-splitting should at _least_ >> restrict itself to operate on optimize_loop_for_speed_p () loops. > I think we need to decide if we want the code at all, particularly given > the multiple-exit problem. > > The difficulty is I think Ajit posted some recent data that shows it's > helping. So maybe the thing to do is ask Ajit to try the tracer > independent of path splitting and take the obvious actions based on > Ajit's data. > > >> >> It should also (even if counter-intuitive) limit the amount of stmt copying >> it does - after all there is sth like an instruction cache size which >> exceeeding >> for loops will never be a good idea (and even smaller special loop caches on >> some archs). > Yup. > >> >> Note that a better heuristic than "at least more than one stmt" would be >> to have at least one PHI in the merger block. Otherwise I don't see how >> CSE opportunities could exist we don't see without the duplication. >> And yes, more PHIs -> more possible CSE. I wouldn't say so for >> the number of stmts. So please limit the number of stmt copies! >> (after all we do limit the number of stmts we copy during jump threading!) > Let's get some more data before we try to tune path splitting. In an > ideal world, the tracer can handle this for us and we just remove path > splitting completely. > > Jeff