On Fri, Mar 20, 2026 at 8:09 PM David Malcolm via Gcc <[email protected]> wrote: > > On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote: > > > > > > > -----Original Message----- > > > From: David Malcolm <[email protected]> > > > Sent: Friday, March 20, 2026 10:10 > > > To: Robert Dubner <[email protected]>; [email protected] > > > Subject: Re: COBOL: Hoping for insight with middle-end computation > > > time. > > > > > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > > > > It happens that COBOL has the COMPUTE statement. It takes the > > > > form > > > > of, > > > > for example, COMPUTE DDD = AAA + BBB. > > > > > > > > We implement that by creating a temporary variable, using that as > > > > the > > > > target of an addition of AAA and BBB, and then doing an > > > > assignment to > > > > DDD. > > > > (Recall that COBOL variables can be quite complex, so we are a > > > > long > > > > way > > > > from being able to do this with an ADD_EXPR.) > > > > > > > > We have determined that the way I've been producing GENERIC for > > > > that > > > > results in N-squared computation time somewhere in the middle > > > > end. > > > > > > > > I have been tearing my hair out trying to figure out what's > > > > causing > > > > that > > > > N-squared behavior. I commented away the assignment, and I got > > > > rid > > > > of the > > > > arithmetic. All that's left is the creation of the temporary, > > > > and > > > > some IF > > > > statements that are generated to test for errors along the way of > > > > the > > > > computation. (COBOL has a very rich error-detection and > > > > exception-generating facility. > > > > > > > > The remaining GIMPLE for a single iteration (as shown by > > > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > > > times > > > > for repetitions of that GIMPLE are shown here: > > > > > > > > phase opt Factor > > > > Repeats & generate > > > > 1,000 0.17 > > > > 2,000 0.49 2.9 > > > > 4,000 1.55 3.2 > > > > 8,000 7.56 4.9 > > > > 16,000 49.31 6.5 > > > > 32,000 281.29 5.7 > > > > > > > > I have been struggling with this for days. Is there an > > > > explanation > > > > for > > > > why the following GIMPLE is resulting in that N-squared behavior? > > > > > > Hi Bob. You cite the overall "phase opt and generate" times, but > > > no > > > data on how this is spread across the various optimization passes. > > > > > > What's the output of -ftime-report on your workload? > > > > > > In particular, are there any specific passes that are responsible > > > for > > > the growth in time (and thus where we can pinpoint a bug), or is > > > the > > > time evenly distributed across all of them? > > > > > > Sorry if this is a silly question > > > Dave > > > > The only silly thing is that in an egregious display of monumental > > arrogance > > and ignorance, some years back I decided to take on the problem of > > code > > generation for a new front end without having had any prior > > experience with > > GCC internals or compiler theory, and without access to anybody who > > actually > > knew anything. > > > > I hope you've seen my response to Richard. For a compilation with > > > > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > > > > the next big component is > > > > thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) > > > > What that means is yet one more mystery to me. > > 64% of the wallclock time is being accounted to this timing item. > Looking in timevar.def, I see that this timing item is: > > DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue") > > Grepping for TV_THREAD_PROLOGUE_AND_EPILOGUE, I see two passes in > function.cc that account their time to this timevar: > pass_thread_prologue_and_epilogue and > pass_late_thread_prologue_and_epilogue. > > Both of these passes ultimately call the function > rest_of_handle_thread_prologue_and_epilogue. > > So the slowdown is presumably somewhere inside there.
Possibly shrink-wrapping, try -fno-shrink-wrap Other than that - clearly a bug. Care to file a bugzilla for this compile-time hog? Richard. > Dave > > > > > > > > Here's the entire output of -ftime-report-details: > > > > Time variable wall GGC > > phase setup : 0.01 ( 0%) 150k ( 0%) > > phase parsing : 1.33 ( 8%) 36M ( 12%) > > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > > phase last asm : 0.02 ( 0%) 377k ( 0%) > > phase finalize : 0.01 ( 0%) 0 ( 0%) > > garbage collection : 0.09 ( 1%) 0 ( 0%) > > callgraph construction : 0.13 ( 1%) 12M ( 4%) > > `- CFG verifier : 0.02 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.16 ( 1%) 0 ( 0%) > > `- symout : 0.01 ( 0%) 10M ( 4%) > > `- tree SSA verifier : 0.04 ( 0%) 0 ( 0%) > > `- garbage collection : 0.01 ( 0%) 0 ( 0%) > > callgraph optimization : 0.04 ( 0%) 0 ( 0%) > > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.21 ( 1%) 0 ( 0%) > > `- tree SSA verifier : 0.06 ( 0%) 0 ( 0%) > > `- garbage collection : 0.03 ( 0%) 0 ( 0%) > > callgraph ipa passes : 1.14 ( 7%) 35M ( 12%) > > ipa function summary : 0.00 ( 0%) 1832 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > ipa inlining heuristics : 0.00 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > ipa comdats : 0.00 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > ipa free lang data : 0.00 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > ipa free inline summary : 0.00 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.04 ( 0%) 0 ( 0%) > > ipa modref : 0.00 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > cfg construction : 0.01 ( 0%) 1232 ( 0%) > > `- rebuild jump labels : 0.01 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.03 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.03 ( 0%) 0 ( 0%) > > cfg cleanup : 0.01 ( 0%) 208 ( 0%) > > `- CFG verifier : 0.03 ( 0%) 0 ( 0%) > > CFG verifier : 0.40 ( 2%) 0 ( 0%) > > trivially dead code : 0.02 ( 0%) 0 ( 0%) > > df scan insns : 0.07 ( 0%) 96 ( 0%) > > `- verify RTL sharing : 0.01 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > df live regs : 0.03 ( 0%) 0 ( 0%) > > df reg dead/unused notes : 0.03 ( 0%) 2628k ( 1%) > > register information : 0.01 ( 0%) 0 ( 0%) > > alias analysis : 0.01 ( 0%) 1024k ( 0%) > > rebuild jump labels : 0.01 ( 0%) 168 ( 0%) > > parser (global) : 1.33 ( 8%) 36M ( 12%) > > early inlining heuristics : 0.00 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > inline parameters : 0.03 ( 0%) 672 ( 0%) > > `- tree SSA verifier : 0.02 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.04 ( 0%) 0 ( 0%) > > tree gimplify : 0.09 ( 1%) 50M ( 17%) > > tree eh : 0.01 ( 0%) 584 ( 0%) > > tree CFG construction : 0.02 ( 0%) 13M ( 5%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > tree CFG cleanup : 0.02 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > tree SSA other : 0.00 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > tree SSA rewrite : 0.03 ( 0%) 20M ( 7%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > `- tree operand scan : 0.01 ( 0%) 8470k ( 3%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > tree operand scan : 0.01 ( 0%) 8470k ( 3%) > > tree SSA verifier : 0.36 ( 2%) 0 ( 0%) > > `- dominance computation : 0.06 ( 0%) 0 ( 0%) > > tree STMT verifier : 0.88 ( 5%) 0 ( 0%) > > tree switch lowering : 0.00 ( 0%) 0 ( 0%) > > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > callgraph verifier : 0.06 ( 0%) 0 ( 0%) > > `- callgraph verifier : 0.06 ( 0%) 0 ( 0%) > > dominance computation : 0.12 ( 1%) 0 ( 0%) > > out of ssa : 0.03 ( 0%) 952 ( 0%) > > expand vars : 0.01 ( 0%) 7040k ( 2%) > > expand : 0.16 ( 1%) 86M ( 29%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.01 ( 0%) 0 ( 0%) > > `- out of ssa : 0.03 ( 0%) 952 ( 0%) > > `- expand vars : 0.01 ( 0%) 7040k ( 2%) > > `- post expand cleanups : 0.01 ( 0%) 96 ( 0%) > > post expand cleanups : 0.01 ( 0%) 3712 ( 0%) > > `- rebuild jump labels : 0.01 ( 0%) 168 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > jump : 0.00 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.03 ( 0%) 0 ( 0%) > > `- trivially dead code : 0.01 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.02 ( 0%) 0 ( 0%) > > loop init : 0.01 ( 0%) 3040 ( 0%) > > mode switching : 0.00 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.01 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > integrated RA : 0.49 ( 3%) 12M ( 4%) > > `- register information : 0.01 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > > `- df reg dead/unused notes : 0.03 ( 0%) 2628k ( 1%) > > `- alias analysis : 0.01 ( 0%) 1024k ( 0%) > > `- trivially dead code : 0.01 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > `- df live regs : 0.01 ( 0%) 0 ( 0%) > > LRA non-specific : 0.22 ( 1%) 352 ( 0%) > > `- LRA hard reg assignment : 0.01 ( 0%) 0 ( 0%) > > `- LRA virtuals elimination : 0.17 ( 1%) 23M ( 8%) > > `- LRA create live ranges : 0.01 ( 0%) 0 ( 0%) > > LRA virtuals elimination : 0.17 ( 1%) 23M ( 8%) > > LRA create live ranges : 0.01 ( 0%) 0 ( 0%) > > LRA hard reg assignment : 0.01 ( 0%) 0 ( 0%) > > reload : 0.00 ( 0%) 48 ( 0%) > > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > > `- integrated RA : 0.05 ( 0%) 96 ( 0%) > > `- LRA non-specific : 0.06 ( 0%) 24 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) > > `- CFG verifier : 0.03 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > > `- df live regs : 0.01 ( 0%) 0 ( 0%) > > machine dep reorg : 0.00 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > > shorten branches : 0.05 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > > reg stack : 0.00 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.04 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.02 ( 0%) 0 ( 0%) > > final : 0.10 ( 1%) 3887k ( 1%) > > `- verify RTL sharing : 0.04 ( 0%) 0 ( 0%) > > `- symout : 0.01 ( 0%) 7192k ( 2%) > > symout : 0.04 ( 0%) 18M ( 6%) > > access analysis : 0.06 ( 0%) 24 ( 0%) > > `- tree SSA verifier : 0.02 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.04 ( 0%) 0 ( 0%) > > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > > early local passes : 0.00 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > > rest of compilation : 0.10 ( 1%) 1754k ( 1%) > > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.24 ( 1%) 0 ( 0%) > > `- garbage collection : 0.04 ( 0%) 0 ( 0%) > > `- tree STMT verifier : 0.13 ( 1%) 0 ( 0%) > > `- tree SSA verifier : 0.06 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.13 ( 1%) 0 ( 0%) > > unaccounted post reload : 0.00 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > unaccounted late compilation : 0.00 ( 0%) 0 ( 0%) > > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > > verify RTL sharing : 0.54 ( 3%) 0 ( 0%) > > TOTAL : 16.32 296M > > > > > > > > > > > > > > > > > > I agree that the conditionals are a bit convoluted, and Jim and I > > > > are > > > > working to do COMPUTE in a much improved way. But I still feel a > > > > need to > > > > understand what's going on! > > > > > > > > Thanks so much for any insight into what might be happening here. > > > > > > > > _intermediate__stack501_503.0.0.data = > > > > &_stack501_data_517.0; > > > > _intermediate__stack501_503.0.0.capacity = 16; > > > > _intermediate__stack501_503.0.0.allocated = 16; > > > > _intermediate__stack501_503.0.0.offset = 0; > > > > _intermediate__stack501_503.0.0.name = &"_stack501"[0]; > > > > _intermediate__stack501_503.0.0.picture = &""[0]; > > > > _intermediate__stack501_503.0.0.initial = 0B; > > > > _intermediate__stack501_503.0.0.parent = 0B; > > > > _intermediate__stack501_503.0.0.occurs_lower = 0; > > > > _intermediate__stack501_503.0.0.occurs_upper = 0; > > > > _intermediate__stack501_503.0.0.attr = 4160; > > > > _intermediate__stack501_503.0.0.type = 6; > > > > _intermediate__stack501_503.0.0.level = 0; > > > > _intermediate__stack501_503.0.0.digits = 37; > > > > _intermediate__stack501_503.0.0.rdigits = 0; > > > > _intermediate__stack501_503.0.0.encoding = 1; > > > > _intermediate__stack501_503.0.0.alphabet = 0; > > > > D.2812 = 0; > > > > D.2813 = 0; > > > > _1013 = D.2812 & 18; > > > > if (_1013 != 0) goto <D.9353>; else goto <D.9354>; > > > > <D.9353>: > > > > goto <D.9355>; > > > > <D.9354>: > > > > D.2814 = 0; > > > > ..pa_erf.1519_1014 = ..pa_erf; > > > > D.2813 = D.2813 | ..pa_erf.1519_1014; > > > > if (D.2813 != 0) goto <D.9357>; else goto <D.9358>; > > > > <D.9357>: > > > > goto <D.9359>; > > > > <D.9358>: > > > > <D.9359>: > > > > <D.9355>: > > > > > > > > > > >
