Come to think of it, why _are_ you thinking about lexing so hard now? You're working on a bootstrap compiler, right? So wouldn't any old slow lexer with broken Unicode cut it for now? Wouldn't it be more fun to implement a serious lexer in BitC itself?
Maybe you already answered this, but I couldn't find where. On Mon, Apr 28, 2014 at 5:36 PM, Jonathan S. Shapiro <[email protected]> wrote: > OK. One last bit on regexps, and then I'll go crawl under a rock for a while > more. This one has to do with prioritized choice. > > Classic PCRE actually doesn't match regular expressions at all. > Backreferences require a pushdown automata. Russ took most of those features > out of his input specification, so I'm going to ignore them here. > > PCRE implements (|) as "prioritized choice" rather than adopting the > traditional RE rule of "longest match". Ironically, adopting the prioritized > choice model simplifies the token matching problem, because it resolves the > problem of multiple matches, and thereby induces a clear rule about which > token ID should prevail on match. We can obviously add a priority to the > thread context, but that eliminates our ability to avoid redundant additions > of matching threads. The number of total possible priorities is bounded by > the length of the input RE, but still, it's messy. > > For a while, I thought we might get away with it on the grounds that > priority seemed to be a strict function of the PC value. Unfortunately, the > fine print of the subpattern rule says that the the higher priority > subpattern only wins if the remainder of the regular expression matches. > It's pretty fundamentally a recursive, backtracking implementation. Offhand, > I see no way (in general) to build a deterministic automata when this rule > is in effect. I can see ways to optimize/simplify a PCRE in such a way that > we can recognize when prioritized choice isn't necessary, but I don't see a > general way to turn an NFA with prioritized matching into a DFA. > > This may be a good thing for now, because it means that I can stop screwing > around with re2/PCRE sooner, but given my head is in it at the moment, it > would sure be nice not to do that. > > Looking at the pcrematch documentation, it appears to me that > pcre_dfa_exec() doesn't actually implement a DFA at all. > > Anybody see a way around this? I confess that I do not, and it seems that > pcre "dfa" does not either. > > > shap _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
