On 09/16/2011 03:03 PM, [email protected] wrote:
Please remember that dfa.[ch] are shared code with gawk and I think
also gettext (although I don't know how up to date gettext's version is).
I'd really prefer not to have too many GREP_xxx kinds of things in those
files. (It's ok in the rest of grep, of course.:-)
We could separate the variables for dfa and the rest of grep. Grep just
needs "#define DFA_MB_CUR_MAX GREP_MB_CUR_MAX" then (and you can
similarly "#define DFA_MB_CUR_MAX gawk_mb_cur_max" in gawk).
For what it's worth, MB_CUR_MAX is a function call in GLIBC. There were
some cases in gawk where I was losing a noticable amount of time calling
it a lot. So I set up a global variable gawk_mb_cur_max and initialize
it in main(), since the result should never change during a single run of
the program. It made a difference.
Interesting. We do have a field for mb_cur_max in dfaexec, but it is
there because some UTF-8 regex can be run as if the locale was single
byte. I suspect however that awk programs (especially badly written
ones!) do more regex compilation than grep, up to 1 compilation per
match. For grep it shouldn't really matter.
Having variables grep_mb_cur_max and dfa_mb_cur_max (separate for the
reasons Arnold explained) would work, but it would make it impossible
for the compiler to throw away the multibyte code when MBS_SUPPORT is zero.
Paolo