https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
--- Comment #58 from amker at gcc dot gnu.org --- (In reply to Bill Schmidt from comment #56) > (In reply to Bill Schmidt from comment #53) > > I'm not a fan of a tree-level unroller. It's impossible to make good > > decisions about unroll factors that early. But your second approach sounds > > quite promising to me. > > I would be willing to soften this statement. I think that an early unroller > might well be a profitable approach for most systems with large caches and > so forth, where if the unrolling heuristics are not completely accurate we > are still likely to make a reasonably good decision. However, I would > expect to see ports with limited caches/memory to want more accurate control > over unrolling decisions. So I could see allowing ports to select between a > GIMPLE unroller and an RTL unroller (I doubt anybody would want both). Thanks for the comments. As David suggested, we can try to implement a relatively conservative unroller and make sure it's a win in most unrolled cases, even with some opportunities missed. Then we can enable it at O3/Ofast level, that would be wanted I think since now we don't have a general unroller by default. > > In general it seems like PowerPC could benefit from more aggressive > unrolling much of the time, provided we can also solve the related IVOPTS > problems that cause too much register spill. > > I may have an interest in working on a GIMPLE unroller, depending on how > quickly I can complete or shed some other projects... (In reply to rguent...@suse.de from comment #57) > On Tue, 11 Aug 2015, wschmidt at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256 > > > > --- Comment #56 from Bill Schmidt <wschmidt at gcc dot gnu.org> --- > > (In reply to Bill Schmidt from comment #53) > > > I'm not a fan of a tree-level unroller. It's impossible to make good > > > decisions about unroll factors that early. But your second approach > > > sounds > > > quite promising to me. > > > > I would be willing to soften this statement. I think that an early unroller > > might well be a profitable approach for most systems with large caches and > > so > > forth, where if the unrolling heuristics are not completely accurate we are > > still likely to make a reasonably good decision. However, I would expect to > > see ports with limited caches/memory to want more accurate control over > > unrolling decisions. So I could see allowing ports to select between a > > GIMPLE > > unroller and an RTL unroller (I doubt anybody would want both). > > > > In general it seems like PowerPC could benefit from more aggressive > > unrolling > > much of the time, provided we can also solve the related IVOPTS problems > > that > > cause too much register spill. > > > > I may have an interest in working on a GIMPLE unroller, depending on how > > quickly I can complete or shed some other projects... > > I think that a separate unrolling on GIMPLE would be a hard sell > due to the lack of a good cost mode. _But_ doing unrolling as part > of another transform like we are doing now makes sense. So does > eventually moving parts of an RTL pass involving unrolling to > GIMPLE, like modulo scheduling or SMS (leaving the scheduling part > to RTL). (In reply to Bill Schmidt from comment #56) > (In reply to Bill Schmidt from comment #53) > > I'm not a fan of a tree-level unroller. It's impossible to make good > > decisions about unroll factors that early. But your second approach sounds > > quite promising to me. > > I would be willing to soften this statement. I think that an early unroller > might well be a profitable approach for most systems with large caches and > so forth, where if the unrolling heuristics are not completely accurate we > are still likely to make a reasonably good decision. However, I would > expect to see ports with limited caches/memory to want more accurate control > over unrolling decisions. So I could see allowing ports to select between a > GIMPLE unroller and an RTL unroller (I doubt anybody would want both). As David suggested, we can try to implement a relatively conservative unroller and make sure it's a win in most unrolled cases, even with some opportunities missed. Then we can enable it at O3/Ofast level, it would be nice since we don't have a general unroller by default. About cost-model. Is it possible to introduce cache information model in GCC? I don't see it's a difficult problem, and can be a start for possible cache sensitive optimizations in the future? Another general question is: what kind of cost do we need in a fine unroller, besides cache/branch ones? > > In general it seems like PowerPC could benefit from more aggressive > unrolling much of the time, provided we can also solve the related IVOPTS > problems that cause too much register spill. > > I may have an interest in working on a GIMPLE unroller, depending on how > quickly I can complete or shed some other projects... > > Note that the RTL unroller is not enabled by default by any optimization > level and note that unfortunately the RTL unroller shares flags with > the GIMPLE level complete peeling (where it mainly controls cost > modeling). Oh, but it's enabled with -fprofile-use. > > It's been a long time since I've done SPEC measuring with/without > -funroll-loops (or/and -fpeel-loops). Note that these flags have > secondary effects as well: > > toplev.c: flag_web = flag_unroll_loops || flag_peel_loops; > toplev.c: flag_rename_registers = flag_unroll_loops || flag_peel_loops;