Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)

Eric Blake Mon, 11 Aug 2008 13:39:02 -0700

Ralf Wildenhues <Ralf.Wildenhues <at> gmx.de> writes:

> 
> Hello again, and apologies for breaking the threading,


No problem.  In truth, this is enough of an independent topic to be worth the 
broken threading.

> 
> I've done a wee bit of measuring now.  Time for running autoconf in OpenMPI
> is 15s with branch-1_4 and branch-1.6, 27s with master, and 23s when master
> is configured --disable-shared.

Thanks for the stats.

> 
> Then, a gprof comparison between 1.6 and master shows that a significant other
> part of the slowdown is due to the fact that master has to do an indirect
> function call to for every character in next_char.  Can't the module interface
> use larger boundaries than character for its interface, like reading a whole
> token or so?  I mean, we're talking about roughly 140M function calls here.

Sweet!  Your measurements confirmed what I already suspected.  And this means a 
performance patch is already in the pipeline - the moment I port stage 29 from 
the argv_ref branch (currently at [1], although that branch is still being 
actively rewound at times as I rebase in various bug fixes), then the input 
engine will be doing just that - reading blocks of data rather than bytes.

[1] http://git.savannah.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=32c3fec7

> 
> Then, I saw that debug stuff like m4_set_current_{file,line} was called veeery
> often (more than once per character).  Rebuilding optimized with -DNDEBUG got
> master to 18s (with --disable-shared).

OK, something I will take a look at improving.  The speed from -DNDEBUG comes 
from avoiding the overhead of a function, thanks to inline accessor macros, but 
avoiding changing the current line and file more than necessary seems like a 
good idea.  At any rate, './configure --disable-assert' is very much a 
performance improvement, on all of the m4 branches.

> 
> The gprof output files seem to indicate that next_char is called much more
> often m4__next_token in master than next_char_1 is from next_token in
> branch-1.6. However, gcov output does not confirm this, so I guess this is
> an artifact from finite sampling density (and the amount that next_char_1
> is faster) or inlining artifacts.

This doesn't surprise me: in branch-1.6, the macro next_char inlines the common 
case of rereading from a string, avoiding a number of next_char_1 calls, but in 
master, there is no inlining because all access is done through indirect 
functions.

-- 
Eric Blake




_______________________________________________
M4-patches mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/m4-patches

Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)

Reply via email to