On Fri, 2026-03-20 at 10:55 -0500, Robert Dubner wrote: > > > > -----Original Message----- > > From: David Malcolm <[email protected]> > > Sent: Friday, March 20, 2026 10:10 > > To: Robert Dubner <[email protected]>; [email protected] > > Subject: Re: COBOL: Hoping for insight with middle-end computation > > time. > > > > On Thu, 2026-03-19 at 18:22 -0500, Robert Dubner wrote: > > > It happens that COBOL has the COMPUTE statement. It takes the > > > form > > > of, > > > for example, COMPUTE DDD = AAA + BBB. > > > > > > We implement that by creating a temporary variable, using that as > > > the > > > target of an addition of AAA and BBB, and then doing an > > > assignment to > > > DDD. > > > (Recall that COBOL variables can be quite complex, so we are a > > > long > > > way > > > from being able to do this with an ADD_EXPR.) > > > > > > We have determined that the way I've been producing GENERIC for > > > that > > > results in N-squared computation time somewhere in the middle > > > end. > > > > > > I have been tearing my hair out trying to figure out what's > > > causing > > > that > > > N-squared behavior. I commented away the assignment, and I got > > > rid > > > of the > > > arithmetic. All that's left is the creation of the temporary, > > > and > > > some IF > > > statements that are generated to test for errors along the way of > > > the > > > computation. (COBOL has a very rich error-detection and > > > exception-generating facility. > > > > > > The remaining GIMPLE for a single iteration (as shown by > > > -fdump-tree-gimple) is shown below. The "phase opt and generate" > > > times > > > for repetitions of that GIMPLE are shown here: > > > > > > phase opt Factor > > > Repeats & generate > > > 1,000 0.17 > > > 2,000 0.49 2.9 > > > 4,000 1.55 3.2 > > > 8,000 7.56 4.9 > > > 16,000 49.31 6.5 > > > 32,000 281.29 5.7 > > > > > > I have been struggling with this for days. Is there an > > > explanation > > > for > > > why the following GIMPLE is resulting in that N-squared behavior? > > > > Hi Bob. You cite the overall "phase opt and generate" times, but > > no > > data on how this is spread across the various optimization passes. > > > > What's the output of -ftime-report on your workload? > > > > In particular, are there any specific passes that are responsible > > for > > the growth in time (and thus where we can pinpoint a bug), or is > > the > > time evenly distributed across all of them? > > > > Sorry if this is a silly question > > Dave > > The only silly thing is that in an egregious display of monumental > arrogance > and ignorance, some years back I decided to take on the problem of > code > generation for a new front end without having had any prior > experience with > GCC internals or compiler theory, and without access to anybody who > actually > knew anything. > > I hope you've seen my response to Richard. For a compilation with > > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > > the next big component is > > thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) > > What that means is yet one more mystery to me.
64% of the wallclock time is being accounted to this timing item. Looking in timevar.def, I see that this timing item is: DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue") Grepping for TV_THREAD_PROLOGUE_AND_EPILOGUE, I see two passes in function.cc that account their time to this timevar: pass_thread_prologue_and_epilogue and pass_late_thread_prologue_and_epilogue. Both of these passes ultimately call the function rest_of_handle_thread_prologue_and_epilogue. So the slowdown is presumably somewhere inside there. Dave > > Here's the entire output of -ftime-report-details: > > Time variable wall GGC > phase setup : 0.01 ( 0%) 150k ( 0%) > phase parsing : 1.33 ( 8%) 36M ( 12%) > phase opt and generate : 14.95 ( 92%) 259M ( 88%) > phase last asm : 0.02 ( 0%) 377k ( 0%) > phase finalize : 0.01 ( 0%) 0 ( 0%) > garbage collection : 0.09 ( 1%) 0 ( 0%) > callgraph construction : 0.13 ( 1%) 12M ( 4%) > `- CFG verifier : 0.02 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.16 ( 1%) 0 ( 0%) > `- symout : 0.01 ( 0%) 10M ( 4%) > `- tree SSA verifier : 0.04 ( 0%) 0 ( 0%) > `- garbage collection : 0.01 ( 0%) 0 ( 0%) > callgraph optimization : 0.04 ( 0%) 0 ( 0%) > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.21 ( 1%) 0 ( 0%) > `- tree SSA verifier : 0.06 ( 0%) 0 ( 0%) > `- garbage collection : 0.03 ( 0%) 0 ( 0%) > callgraph ipa passes : 1.14 ( 7%) 35M ( 12%) > ipa function summary : 0.00 ( 0%) 1832 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > ipa inlining heuristics : 0.00 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > ipa comdats : 0.00 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > ipa free lang data : 0.00 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > ipa free inline summary : 0.00 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.04 ( 0%) 0 ( 0%) > ipa modref : 0.00 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > cfg construction : 0.01 ( 0%) 1232 ( 0%) > `- rebuild jump labels : 0.01 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.03 ( 0%) 0 ( 0%) > `- CFG verifier : 0.03 ( 0%) 0 ( 0%) > cfg cleanup : 0.01 ( 0%) 208 ( 0%) > `- CFG verifier : 0.03 ( 0%) 0 ( 0%) > CFG verifier : 0.40 ( 2%) 0 ( 0%) > trivially dead code : 0.02 ( 0%) 0 ( 0%) > df scan insns : 0.07 ( 0%) 96 ( 0%) > `- verify RTL sharing : 0.01 ( 0%) 0 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > df live regs : 0.03 ( 0%) 0 ( 0%) > df reg dead/unused notes : 0.03 ( 0%) 2628k ( 1%) > register information : 0.01 ( 0%) 0 ( 0%) > alias analysis : 0.01 ( 0%) 1024k ( 0%) > rebuild jump labels : 0.01 ( 0%) 168 ( 0%) > parser (global) : 1.33 ( 8%) 36M ( 12%) > early inlining heuristics : 0.00 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > inline parameters : 0.03 ( 0%) 672 ( 0%) > `- tree SSA verifier : 0.02 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.04 ( 0%) 0 ( 0%) > tree gimplify : 0.09 ( 1%) 50M ( 17%) > tree eh : 0.01 ( 0%) 584 ( 0%) > tree CFG construction : 0.02 ( 0%) 13M ( 5%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > tree CFG cleanup : 0.02 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > tree SSA other : 0.00 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > tree SSA rewrite : 0.03 ( 0%) 20M ( 7%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > `- tree operand scan : 0.01 ( 0%) 8470k ( 3%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > tree operand scan : 0.01 ( 0%) 8470k ( 3%) > tree SSA verifier : 0.36 ( 2%) 0 ( 0%) > `- dominance computation : 0.06 ( 0%) 0 ( 0%) > tree STMT verifier : 0.88 ( 5%) 0 ( 0%) > tree switch lowering : 0.00 ( 0%) 0 ( 0%) > `- tree SSA verifier : 0.01 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > callgraph verifier : 0.06 ( 0%) 0 ( 0%) > `- callgraph verifier : 0.06 ( 0%) 0 ( 0%) > dominance computation : 0.12 ( 1%) 0 ( 0%) > out of ssa : 0.03 ( 0%) 952 ( 0%) > expand vars : 0.01 ( 0%) 7040k ( 2%) > expand : 0.16 ( 1%) 86M ( 29%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.01 ( 0%) 0 ( 0%) > `- out of ssa : 0.03 ( 0%) 952 ( 0%) > `- expand vars : 0.01 ( 0%) 7040k ( 2%) > `- post expand cleanups : 0.01 ( 0%) 96 ( 0%) > post expand cleanups : 0.01 ( 0%) 3712 ( 0%) > `- rebuild jump labels : 0.01 ( 0%) 168 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > jump : 0.00 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.03 ( 0%) 0 ( 0%) > `- trivially dead code : 0.01 ( 0%) 0 ( 0%) > `- CFG verifier : 0.02 ( 0%) 0 ( 0%) > loop init : 0.01 ( 0%) 3040 ( 0%) > mode switching : 0.00 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.01 ( 0%) 0 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > integrated RA : 0.49 ( 3%) 12M ( 4%) > `- register information : 0.01 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > `- df reg dead/unused notes : 0.03 ( 0%) 2628k ( 1%) > `- alias analysis : 0.01 ( 0%) 1024k ( 0%) > `- trivially dead code : 0.01 ( 0%) 0 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > `- df live regs : 0.01 ( 0%) 0 ( 0%) > LRA non-specific : 0.22 ( 1%) 352 ( 0%) > `- LRA hard reg assignment : 0.01 ( 0%) 0 ( 0%) > `- LRA virtuals elimination : 0.17 ( 1%) 23M ( 8%) > `- LRA create live ranges : 0.01 ( 0%) 0 ( 0%) > LRA virtuals elimination : 0.17 ( 1%) 23M ( 8%) > LRA create live ranges : 0.01 ( 0%) 0 ( 0%) > LRA hard reg assignment : 0.01 ( 0%) 0 ( 0%) > reload : 0.00 ( 0%) 48 ( 0%) > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > `- integrated RA : 0.05 ( 0%) 96 ( 0%) > `- LRA non-specific : 0.06 ( 0%) 24 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > thread pro- & epilogue : 10.45 ( 64%) 4104 ( 0%) > `- CFG verifier : 0.03 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > `- df live regs : 0.01 ( 0%) 0 ( 0%) > machine dep reorg : 0.00 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > shorten branches : 0.05 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > reg stack : 0.00 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.04 ( 0%) 0 ( 0%) > `- CFG verifier : 0.02 ( 0%) 0 ( 0%) > final : 0.10 ( 1%) 3887k ( 1%) > `- verify RTL sharing : 0.04 ( 0%) 0 ( 0%) > `- symout : 0.01 ( 0%) 7192k ( 2%) > symout : 0.04 ( 0%) 18M ( 6%) > access analysis : 0.06 ( 0%) 24 ( 0%) > `- tree SSA verifier : 0.02 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.04 ( 0%) 0 ( 0%) > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > early local passes : 0.00 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.02 ( 0%) 0 ( 0%) > rest of compilation : 0.10 ( 1%) 1754k ( 1%) > `- dominance computation : 0.01 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.24 ( 1%) 0 ( 0%) > `- garbage collection : 0.04 ( 0%) 0 ( 0%) > `- tree STMT verifier : 0.13 ( 1%) 0 ( 0%) > `- tree SSA verifier : 0.06 ( 0%) 0 ( 0%) > `- CFG verifier : 0.13 ( 1%) 0 ( 0%) > unaccounted post reload : 0.00 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > unaccounted late compilation : 0.00 ( 0%) 0 ( 0%) > `- verify RTL sharing : 0.02 ( 0%) 0 ( 0%) > `- CFG verifier : 0.01 ( 0%) 0 ( 0%) > verify RTL sharing : 0.54 ( 3%) 0 ( 0%) > TOTAL : 16.32 296M > > > > > > > > > > > > I agree that the conditionals are a bit convoluted, and Jim and I > > > are > > > working to do COMPUTE in a much improved way. But I still feel a > > > need to > > > understand what's going on! > > > > > > Thanks so much for any insight into what might be happening here. > > > > > > _intermediate__stack501_503.0.0.data = > > > &_stack501_data_517.0; > > > _intermediate__stack501_503.0.0.capacity = 16; > > > _intermediate__stack501_503.0.0.allocated = 16; > > > _intermediate__stack501_503.0.0.offset = 0; > > > _intermediate__stack501_503.0.0.name = &"_stack501"[0]; > > > _intermediate__stack501_503.0.0.picture = &""[0]; > > > _intermediate__stack501_503.0.0.initial = 0B; > > > _intermediate__stack501_503.0.0.parent = 0B; > > > _intermediate__stack501_503.0.0.occurs_lower = 0; > > > _intermediate__stack501_503.0.0.occurs_upper = 0; > > > _intermediate__stack501_503.0.0.attr = 4160; > > > _intermediate__stack501_503.0.0.type = 6; > > > _intermediate__stack501_503.0.0.level = 0; > > > _intermediate__stack501_503.0.0.digits = 37; > > > _intermediate__stack501_503.0.0.rdigits = 0; > > > _intermediate__stack501_503.0.0.encoding = 1; > > > _intermediate__stack501_503.0.0.alphabet = 0; > > > D.2812 = 0; > > > D.2813 = 0; > > > _1013 = D.2812 & 18; > > > if (_1013 != 0) goto <D.9353>; else goto <D.9354>; > > > <D.9353>: > > > goto <D.9355>; > > > <D.9354>: > > > D.2814 = 0; > > > ..pa_erf.1519_1014 = ..pa_erf; > > > D.2813 = D.2813 | ..pa_erf.1519_1014; > > > if (D.2813 != 0) goto <D.9357>; else goto <D.9358>; > > > <D.9357>: > > > goto <D.9359>; > > > <D.9358>: > > > <D.9359>: > > > <D.9355>: > > > > > > >
