On 2/22/07, Russ Cox <[EMAIL PROTECTED]> wrote:
> The Plan 9 regexp library matches the old Unix egrep command.
> Any regexp you'd try under Plan 9 should work with new egreps,
> though not vice versa -- new egreps tend to have newfangled
> additions like [:upper:] and \w and {4,6} for repetition.

This came up as I was implementing my C lexer for the compilers class
I'm taking.  How hard would it be to allow access to regcomp(2)'s
internals, so I could build up a regexp part-by part a la lex?

For example, to recognize C99 hexadecimal floating-point constants, I
wrote a second program that builds up the regexp piece-by-piece using
smprint(2), then compiling the whole thing:

        char    *decdig = "([0-9])",
                *hexdig = "([0-9A-Fa-f])",
                *sign = "([+\\-])",
                *dot = "(\\.)",
                *dseq, *dexp, *dfrac, *decflt,
                *hseq, *bexp, *hfrac, *hexflt;
        dseq = smprint("(%s+)", decdig);
        dexp = smprint("([Ee]%s?%s)", sign, dseq);
        dfrac = smprint("((%s?%s%s)|(%s%s))", dseq, dot, dseq, dseq, dot);
        decflt = smprint("(%s%s?)|(%s%s)", dfrac, dexp, dseq, dexp);
        regcomp(decflt);        // make sure it compiles
        print("decfloat: %s\n", decflt);
        
        hseq = smprint("(%s+)", hexdig);
        bexp = smprint("([Pp]%s?%s)", sign, dseq);
        hfrac = smprint("((%s?%s%s)|(%s%s))", hseq, dot, hseq, hseq, dot);
        hexflt = smprint("0[Xx](%s|%s)%s", hfrac, hseq, bexp);
        regcomp(hexflt);        // make sure it compiles
        print("hexfloat: %s\n", hexflt);

I know that regcomp builds up the Reprog by combining subprograms with
catenation and alternation &c., but I’d be loath to try tinkering
there directly without a much better understanding of the algorithm.
I’ve glanced through the documents at swtch.com/?????  and the regcomp
source code, just haven’t had the time for an in-depth study.

Would such a project be a worthwhile spent of time?  (Might it develop
into the asteroid to kill the dinosaur waiting for it?)

--Joel

Reply via email to