On Mon, Apr 28, 2014 at 4:37 PM, Matt Oliveri <[email protected]> wrote:
> Come to think of it, why _are_ you thinking about lexing so hard now? > You're working on a bootstrap compiler, right? So wouldn't any old > slow lexer with broken Unicode cut it for now? Wouldn't it be more fun > to implement a serious lexer in BitC itself? It's a fair question with a couple of different answers. In all likelihood none of them are terribly compelling. I already have a hand-crafted bootstrap lexer in C++ that does a perfectly fine job. I actually started in to this because I need a parser generator. And yes, I already have a YACC-based parser, and it has been a source of problems. I had been thinking to extract all the keywords from the parser description automatically. The catch is that (a) it's hard to take the resulting list and usefully drop it into a hand-crafted parser, and (b) I didn't realize going in how far PCRE has diverged from honest finite automata. Around the time I started thinking about all of this I happened to be reading Russ Cox's notes on RE2. Can't use that directly, because it doesn't implement a notion of token IDs, so I figured I'd kick the can a bit and give it a try. The result hasn't been immediately useful for BitC, but it's been horrifically educational for me. :-) At least now I know what the BitC implementation is going to look like, and actually, the effort revealed motivation for a couple of idioms that are still difficult to express in BitC. The sheer complexity of the PCRE specification is astonishing. That said, it's darned useful to have a backtracking analyzer for certain kinds of things. The thing I didn't realize going in is that there simply *isn't* a non-recursive implementation even for simple PCRE patterns. Sure, you can simplify down and find cases for which recursion isn't required, but it's a lot of bother to do that. Jonathan
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
