On 2/22/07, Russ Cox <[EMAIL PROTECTED]> wrote:
> The Plan 9 regexp library matches the old Unix egrep command.
> Any regexp you'd try under Plan 9 should work with new egreps,
> though not vice versa -- new egreps tend to have newfangled
> additions like [:upper:] and \w and {4,6} for repetition.
This came up as I was implementing my C lexer for the compilers class
I'm taking. How hard would it be to allow access to regcomp(2)'s
internals, so I could build up a regexp part-by part a la lex?
For example, to recognize C99 hexadecimal floating-point constants, I
wrote a second program that builds up the regexp piece-by-piece using
smprint(2), then compiling the whole thing:
char *decdig = "([0-9])",
*hexdig = "([0-9A-Fa-f])",
*sign = "([+\\-])",
*dot = "(\\.)",
*dseq, *dexp, *dfrac, *decflt,
*hseq, *bexp, *hfrac, *hexflt;
dseq = smprint("(%s+)", decdig);
dexp = smprint("([Ee]%s?%s)", sign, dseq);
dfrac = smprint("((%s?%s%s)|(%s%s))", dseq, dot, dseq, dseq, dot);
decflt = smprint("(%s%s?)|(%s%s)", dfrac, dexp, dseq, dexp);
regcomp(decflt); // make sure it compiles
print("decfloat: %s\n", decflt);
hseq = smprint("(%s+)", hexdig);
bexp = smprint("([Pp]%s?%s)", sign, dseq);
hfrac = smprint("((%s?%s%s)|(%s%s))", hseq, dot, hseq, hseq, dot);
hexflt = smprint("0[Xx](%s|%s)%s", hfrac, hseq, bexp);
regcomp(hexflt); // make sure it compiles
print("hexfloat: %s\n", hexflt);
I know that regcomp builds up the Reprog by combining subprograms with
catenation and alternation &c., but I’d be loath to try tinkering
there directly without a much better understanding of the algorithm.
I’ve glanced through the documents at swtch.com/????? and the regcomp
source code, just haven’t had the time for an in-depth study.
Would such a project be a worthwhile spent of time? (Might it develop
into the asteroid to kill the dinosaur waiting for it?)
--Joel