On Sat, Jan 3, 2015 at 3:29 PM, Eli Zaretskii <[email protected]> wrote: > The bottle-neck is clearly process_node_text, it takes more than 1 sec > when the node is "Index" in the ELisp manual. > > I timed the loop in process_node_text, and it takes about 0.22 msec > per line on the average, and there are 5700 lines in that node. > > I tried to find the culprit in that loop, but it's hard to time such > small intervals reliably. My gut feeling is that the call to > printed_representation is the reason: we call that function once for > each character on the line. But I cannot prove that, and I cannot > explain why you don't see the same delay. Perhaps the reason is that > some functions called by printed_representation, which in my build are > supplied by gnulib, are much faster in glibc. This is based on the > following profile I get from gprof: > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls Ts/call Ts/call name > 50.00 0.02 0.02 locale_charset > 25.00 0.03 0.01 wcwidth > 12.50 0.04 0.01 > add_file_directory_to_path > 12.50 0.04 0.01 main > 0.00 0.04 0.00 415064 0.00 0.00 printed_representation > 0.00 0.04 0.00 229065 0.00 0.00 reset_conversion > 0.00 0.04 0.00 61093 0.00 0.00 text_buffer_alloc > 0.00 0.04 0.00 51681 0.00 0.00 text_buffer_iconv > 0.00 0.04 0.00 51681 0.00 0.00 text_buffer_space_left > 0.00 0.04 0.00 31362 0.00 0.00 skip_whitespace > 0.00 0.04 0.00 19793 0.00 0.00 read_quoted_string > > As you see, wcwidth and locale_charset, both from gnulib in my build, > take 75% of the time.
The calls to wcwidth are new; previously there was no handling of characters that span two display columns. I am going to simplify the code in process_node_text to only deal with calculating the line starts: it is generic code that was previously used for the screen update as well. If that doesn't produce a speed-up then it could be slow because it is calling wcwidth on every single character.
