Come to think of it, why _are_ you thinking about lexing so hard now?
You're working on a bootstrap compiler, right? So wouldn't any old
slow lexer with broken Unicode cut it for now? Wouldn't it be more fun
to implement a serious lexer in BitC itself?

Maybe you already answered this, but I couldn't find where.

On Mon, Apr 28, 2014 at 5:36 PM, Jonathan S. Shapiro <[email protected]> wrote:
> OK. One last bit on regexps, and then I'll go crawl under a rock for a while
> more. This one has to do with prioritized choice.
>
> Classic PCRE actually doesn't match regular expressions at all.
> Backreferences require a pushdown automata. Russ took most of those features
> out of his input specification, so I'm going to ignore them here.
>
> PCRE implements (|) as "prioritized choice" rather than adopting the
> traditional RE rule of "longest match". Ironically, adopting the prioritized
> choice model simplifies the token matching problem, because it resolves the
> problem of multiple matches, and thereby induces a clear rule about which
> token ID should prevail on match. We can obviously add a priority to the
> thread context, but that eliminates our ability to avoid redundant additions
> of matching threads. The number of total possible priorities is bounded by
> the length of the input RE, but still, it's messy.
>
> For a while, I thought we might get away with it on the grounds that
> priority seemed to be a strict function of the PC value. Unfortunately, the
> fine print of the subpattern rule says that the the higher priority
> subpattern only wins if the remainder of the regular expression matches.
> It's pretty fundamentally a recursive, backtracking implementation. Offhand,
> I see no way (in general) to build a deterministic automata when this rule
> is in effect. I can see ways to optimize/simplify a PCRE in such a way that
> we can recognize when prioritized choice isn't necessary, but I don't see a
> general way to turn an NFA with prioritized matching into a DFA.
>
> This may be a good thing for now, because it means that I can stop screwing
> around with re2/PCRE sooner, but given my head is in it at the moment, it
> would sure be nice not to do that.
>
> Looking at the pcrematch documentation, it appears to me that
> pcre_dfa_exec() doesn't actually implement a DFA at all.
>
> Anybody see a way around this? I confess that I do not, and it seems that
> pcre "dfa" does not either.
>
>
> shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to