Ralf Wildenhues <Ralf.Wildenhues <at> gmx.de> writes: > > Hello again, and apologies for breaking the threading,
No problem. In truth, this is enough of an independent topic to be worth the broken threading. > > I've done a wee bit of measuring now. Time for running autoconf in OpenMPI > is 15s with branch-1_4 and branch-1.6, 27s with master, and 23s when master > is configured --disable-shared. Thanks for the stats. > > Then, a gprof comparison between 1.6 and master shows that a significant other > part of the slowdown is due to the fact that master has to do an indirect > function call to for every character in next_char. Can't the module interface > use larger boundaries than character for its interface, like reading a whole > token or so? I mean, we're talking about roughly 140M function calls here. Sweet! Your measurements confirmed what I already suspected. And this means a performance patch is already in the pipeline - the moment I port stage 29 from the argv_ref branch (currently at [1], although that branch is still being actively rewound at times as I rebase in various bug fixes), then the input engine will be doing just that - reading blocks of data rather than bytes. [1] http://git.savannah.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=32c3fec7 > > Then, I saw that debug stuff like m4_set_current_{file,line} was called veeery > often (more than once per character). Rebuilding optimized with -DNDEBUG got > master to 18s (with --disable-shared). OK, something I will take a look at improving. The speed from -DNDEBUG comes from avoiding the overhead of a function, thanks to inline accessor macros, but avoiding changing the current line and file more than necessary seems like a good idea. At any rate, './configure --disable-assert' is very much a performance improvement, on all of the m4 branches. > > The gprof output files seem to indicate that next_char is called much more > often m4__next_token in master than next_char_1 is from next_token in > branch-1.6. However, gcov output does not confirm this, so I guess this is > an artifact from finite sampling density (and the amount that next_char_1 > is faster) or inlining artifacts. This doesn't surprise me: in branch-1.6, the macro next_char inlines the common case of rereading from a string, avoiding a number of next_char_1 calls, but in master, there is no inlining because all access is done through indirect functions. -- Eric Blake _______________________________________________ M4-patches mailing list [email protected] http://lists.gnu.org/mailman/listinfo/m4-patches
