Eric Blake wrote:
> First off, you don't appear to have copyright assignment on file for M4
> yet, and this is not a trivial patch. Care to follow through with that?
Yes, I'll do this ASAP.
> Next, I'd like to finish merging the argv_ref into both branch-1_4 and
> master before applying your patch, while it's still fresh on my mind.
Sure, algorithmic changes should go in first.
> | + if (curr_quote.len1 == 1 && curr_quote.len2 == 1 && !input_change
> | + && isp)
>
> Not quite right. The condition here needs to be that quote_age is
> non-zero (certain 1-byte quote combinations don't optimize well according
> to the semantics required by M4, such as when they overlap with comment
> delimiters; the calculation of quote_age already took this into account).
I don't understand. In the current code, quote_age is taken is only taken
into account by the INPUT_CHAIN part of next_char and next_char_1. If a test
of quote_change is missing here in next_token, it is also missing in next_char
and next_char_1. - Or do you mean that a new boolean variable
'singlechar_quotes' could be introduced, which shall always be equivalent to
curr_quote.len1 == 1 && curr_quote.len2 == 1, and which shall be updated by
set_quotes?
> ~ I'm also considering adding a placeholder input block that always results
> in CHAR_EOF, so that isp is guaranteed to be non-null and we have one less
> branch in this loop.
Tried this already (see attached patch). It does not provide a noticeable
speedup: 41.0 sec instead of 41.1, that's less than 0.5%. You get more
speedup (about 1%) by use of __builtin_expect (also attached).
> Basically, rather than calling next_char() all the time, I envision:
>
> /* Return a pointer into the buffer available from the current input
> block, and set *LEN to the length of the result. If the next character to
> be parsed cannot be represented as an unsigned char (such as CHAR_EOF), or
> if the input block does not have read-ahead data buffered at the moment,
> return NULL, and the caller must fall back to using next_char(). */
> char *curr_buf (size_t *len);
>
> /* Discard LEN bytes from the current input block. LEN must be less than
> or equal to the previous length returned by a successful call to
> curr_buf(). */
> void consume (size_t len);
Very good. The entire idea of my patch was to
1. avoid a function call for every character,
2. treat consecutive characters which are all alike regarding syntax in a
single operation.
Inlining next_char and next_char_1 was a way to do it; these new primitives
are another way to do it, and a better one - since they don't require as much
code duplication.
Now I see where freadptr() and freadseek() come in handy...
> | + else if (chain->type == CHAIN_STR && chain->u.u_s.len > 0)
> | + {
> | + unsigned char curr_quote_1 =
> | + to_uchar (curr_quote.str1[0]);
>
> Unnecessary cast. char is assignable to unsigned char without issues.
Yes, but I prefer casts to be explicit rather than implicit.
> It seems a shame that there isn't really a function optimized for
> searching for the first of two characters among a fixed-size memory block.
Yes. We have strcspn but it is not well optimized for the case of only 1 or 2
delimiter chars.
> /* Return the address of the first character of either C1 or C2, treated
> as unsigned int, that occurs within the first N bytes of S; else return
> NULL if neither character occurs. */
> void *memchr2 (void const *s, int c1, int c2, size_t n);
Yes, I see where this function would come in handy.
Bruno
41.0 sec user + 0.9 sec system
*** input.c.patch1 2008-02-28 01:45:09.000000000 +0100
--- input.c 2008-02-28 01:59:43.000000000 +0100
***************
*** 75,81 ****
INPUT_STRING, /* String resulting from macro expansion. */
INPUT_FILE, /* File from command line or include. */
INPUT_MACRO, /* Builtin resulting from defn. */
! INPUT_CHAIN /* FIFO chain of separate strings and $@ refs. */
};
typedef enum input_type input_type;
--- 75,82 ----
INPUT_STRING, /* String resulting from macro expansion. */
INPUT_FILE, /* File from command line or include. */
INPUT_MACRO, /* Builtin resulting from defn. */
! INPUT_CHAIN, /* FIFO chain of separate strings and $@ refs. */
! INPUT_NONE /* End of the input block stack. */
};
typedef enum input_type input_type;
***************
*** 115,120 ****
--- 116,126 ----
u;
};
+ /* This input block is an indicator that the input stack is at its end.
+ This fake input block avoids the need for testing 'isp' in next_char()
+ and next_char_1(). */
+ static struct input_block input_block_none = { NULL, INPUT_NONE };
+
/* Current input file name. */
const char *current_file;
***************
*** 307,313 ****
{
/* Free any memory occupied by completely parsed strings. */
assert (next == NULL);
! while (isp && pop_input (false));
/* Reserve the next location on the obstack. */
next = (input_block *) obstack_alloc (current_input, sizeof *next);
--- 313,319 ----
{
/* Free any memory occupied by completely parsed strings. */
assert (next == NULL);
! while (isp->type != INPUT_NONE && pop_input (false));
/* Reserve the next location on the obstack. */
next = (input_block *) obstack_alloc (current_input, sizeof *next);
***************
*** 607,613 ****
return false;
if (debug_level & DEBUG_TRACE_INPUT)
{
! if (tmp)
DEBUG_MESSAGE2 ("input reverted to %s, line %d",
tmp->file, tmp->line);
else
--- 613,619 ----
return false;
if (debug_level & DEBUG_TRACE_INPUT)
{
! if (tmp->type != INPUT_NONE)
DEBUG_MESSAGE2 ("input reverted to %s, line %d",
tmp->file, tmp->line);
else
***************
*** 652,658 ****
obstack_free (current_input, NULL);
free (current_input);
! if (wsp == NULL)
{
/* End of the program. Free all memory even though we are about
to exit, since it makes leak detection easier. */
--- 658,664 ----
obstack_free (current_input, NULL);
free (current_input);
! if (wsp->type == INPUT_NONE)
{
/* End of the program. Free all memory even though we are about
to exit, since it makes leak detection easier. */
***************
*** 671,677 ****
obstack_init (wrapup_stack);
isp = wsp;
! wsp = NULL;
input_change = true;
return true;
--- 677,683 ----
obstack_init (wrapup_stack);
isp = wsp;
! wsp = &input_block_none;
input_change = true;
return true;
***************
*** 753,761 ****
while (1)
{
- if (block == NULL)
- return CHAR_EOF;
-
switch (block->type)
{
case INPUT_STRING:
--- 759,764 ----
***************
*** 819,824 ****
--- 822,830 ----
}
break;
+ case INPUT_NONE:
+ return CHAR_EOF;
+
default:
assert (!"peek_input");
abort ();
***************
*** 841,847 ****
`-------------------------------------------------------------------*/
#define next_char(AQ) \
! (isp && isp->str_len && !input_change \
? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \
: next_char_1 (AQ))
--- 847,853 ----
`-------------------------------------------------------------------*/
#define next_char(AQ) \
! (isp->str_len && !input_change \
? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \
: next_char_1 (AQ))
***************
*** 853,865 ****
while (1)
{
- if (isp == NULL)
- {
- current_file = "";
- current_line = 0;
- return CHAR_EOF;
- }
-
if (input_change)
{
current_file = isp->file;
--- 859,864 ----
***************
*** 949,954 ****
--- 948,958 ----
}
break;
+ case INPUT_NONE:
+ current_file = "";
+ current_line = 0;
+ return CHAR_EOF;
+
default:
assert (!"next_char_1");
abort ();
***************
*** 1195,1202 ****
obstack_init (&token_stack);
token_bottom = obstack_finish (&token_stack);
! isp = NULL;
! wsp = NULL;
next = NULL;
start_of_input_line = false;
--- 1199,1206 ----
obstack_init (&token_stack);
token_bottom = obstack_finish (&token_stack);
! isp = &input_block_none;
! wsp = &input_block_none;
next = NULL;
start_of_input_line = false;
40.7 sec user + 0.9 sec system
*** input.c.patch3 2008-02-28 02:11:11.000000000 +0100
--- input.c 2008-02-28 02:23:00.000000000 +0100
***************
*** 64,69 ****
--- 64,78 ----
# include "regex.h"
#endif /* ENABLE_CHANGEWORD */
+ /* __builtin_expect(condition, expected_value) tells GCC about the expected
+ value of boolean conditions. */
+ #if __GNUC__ < 3
+ # undef __builtin_expect
+ # define __builtin_expect(condition, expected_value) (condition)
+ #endif
+ #define likely(condition) __builtin_expect (condition, 1)
+ #define unlikely(condition) __builtin_expect (condition, 0)
+
/* Number of bytes where it is more efficient to inline the reference
as a string than it is to track reference bookkeeping for those
bytes. */
***************
*** 762,774 ****
switch (block->type)
{
case INPUT_STRING:
! if (!block->str_len)
break;
return to_uchar (block->u.u_s.str[0]);
case INPUT_FILE:
ch = getc (block->u.u_f.fp);
! if (ch != EOF)
{
ungetc (ch, block->u.u_f.fp);
return ch;
--- 771,783 ----
switch (block->type)
{
case INPUT_STRING:
! if (unlikely (!block->str_len))
break;
return to_uchar (block->u.u_s.str[0]);
case INPUT_FILE:
ch = getc (block->u.u_f.fp);
! if (likely (ch != EOF))
{
ungetc (ch, block->u.u_f.fp);
return ch;
***************
*** 847,853 ****
`-------------------------------------------------------------------*/
#define next_char(AQ) \
! (isp->str_len && !input_change \
? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \
: next_char_1 (AQ))
--- 856,862 ----
`-------------------------------------------------------------------*/
#define next_char(AQ) \
! (likely (isp->str_len) && likely (!input_change) \
? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \
: next_char_1 (AQ))
***************
*** 859,865 ****
while (1)
{
! if (input_change)
{
current_file = isp->file;
current_line = isp->line;
--- 868,874 ----
while (1)
{
! if (unlikely (input_change))
{
current_file = isp->file;
current_line = isp->line;
***************
*** 869,881 ****
switch (isp->type)
{
case INPUT_STRING:
! if (!isp->str_len)
break;
isp->str_len--;
return to_uchar (*isp->u.u_s.str++);
case INPUT_FILE:
! if (start_of_input_line)
{
start_of_input_line = false;
current_line = ++isp->line;
--- 878,890 ----
switch (isp->type)
{
case INPUT_STRING:
! if (unlikely (!isp->str_len))
break;
isp->str_len--;
return to_uchar (*isp->u.u_s.str++);
case INPUT_FILE:
! if (unlikely (start_of_input_line))
{
start_of_input_line = false;
current_line = ++isp->line;
***************
*** 884,895 ****
/* If stdin is a terminal, calling getc after peek_input
already called it would make the user have to hit ^D
twice to quit. */
! if (!isp->u.u_f.end)
{
ch = getc (isp->u.u_f.fp);
! if (ch != EOF)
{
! if (ch == '\n')
start_of_input_line = true;
return ch;
}
--- 893,904 ----
/* If stdin is a terminal, calling getc after peek_input
already called it would make the user have to hit ^D
twice to quit. */
! if (likely (!isp->u.u_f.end))
{
ch = getc (isp->u.u_f.fp);
! if (likely (ch != EOF))
{
! if (unlikely (ch == '\n'))
start_of_input_line = true;
return ch;
}
***************
*** 1516,1522 ****
/* Can't consume character until after CHAR_MACRO is handled. */
TOKEN_DATA_TYPE (td) = TOKEN_VOID;
ch = peek_input (allow_argv && current_quote_age);
! if (ch == CHAR_EOF)
{
#ifdef DEBUG_INPUT
xfprintf (stderr, "next_token -> EOF\n");
--- 1525,1531 ----
/* Can't consume character until after CHAR_MACRO is handled. */
TOKEN_DATA_TYPE (td) = TOKEN_VOID;
ch = peek_input (allow_argv && current_quote_age);
! if (unlikely (ch == CHAR_EOF))
{
#ifdef DEBUG_INPUT
xfprintf (stderr, "next_token -> EOF\n");
***************
*** 1524,1530 ****
next_char (false);
return TOKEN_EOF;
}
! if (ch == CHAR_MACRO)
{
init_macro_token (td);
next_char (false);
--- 1533,1539 ----
next_char (false);
return TOKEN_EOF;
}
! if (unlikely (ch == CHAR_MACRO))
{
init_macro_token (td);
next_char (false);
***************
*** 1534,1540 ****
#endif /* DEBUG_INPUT */
return TOKEN_MACDEF;
}
! if (ch == CHAR_ARGV)
{
init_argv_token (obs, td);
#ifdef DEBUG_INPUT
--- 1543,1549 ----
#endif /* DEBUG_INPUT */
return TOKEN_MACDEF;
}
! if (unlikely (ch == CHAR_ARGV))
{
init_argv_token (obs, td);
#ifdef DEBUG_INPUT
***************
*** 1549,1555 ****
next_char (false); /* Consume character we already peeked at. */
file = current_file;
*line = current_line;
! if (MATCH (ch, curr_comm.str1, true))
{
if (obs)
obs_td = obs;
--- 1558,1564 ----
next_char (false); /* Consume character we already peeked at. */
file = current_file;
*line = current_line;
! if (unlikely (MATCH (ch, curr_comm.str1, true)))
{
if (obs)
obs_td = obs;
***************
*** 1570,1576 ****
type = TOKEN_STRING;
}
! else if (default_word_regexp && (isalpha (ch) || ch == '_'))
{
obstack_1grow (&token_stack, ch);
while ((ch = peek_input (false)) < CHAR_EOF
--- 1579,1585 ----
type = TOKEN_STRING;
}
! else if (unlikely (default_word_regexp && (isalpha (ch) || ch == '_')))
{
obstack_1grow (&token_stack, ch);
while ((ch = peek_input (false)) < CHAR_EOF
***************
*** 1584,1590 ****
#ifdef ENABLE_CHANGEWORD
! else if (!default_word_regexp && word_regexp.fastmap[ch])
{
obstack_1grow (&token_stack, ch);
while (1)
--- 1593,1599 ----
#ifdef ENABLE_CHANGEWORD
! else if (unlikely (!default_word_regexp && word_regexp.fastmap[ch]))
{
obstack_1grow (&token_stack, ch);
while (1)
***************
*** 1644,1650 ****
while (1)
{
ch = next_char (obs != NULL && current_quote_age);
! if (ch == CHAR_EOF)
/* Current_file changed to "" if we see CHAR_EOF, use
the previous value we stored earlier. */
m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller,
--- 1653,1659 ----
while (1)
{
ch = next_char (obs != NULL && current_quote_age);
! if (unlikely (ch == CHAR_EOF))
/* Current_file changed to "" if we see CHAR_EOF, use
the previous value we stored earlier. */
m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller,
_______________________________________________
M4-discuss mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/m4-discuss