Keean: You make a couple of assertions in your response that I can't work my way through. Could very well be that I just haven't thought this through hard enough yet. Apologies, but I think this will make more sense if I take your comments out of order.
BUFFERING I *did* note the smileys, but I want to make sure we're laughing at the same joke. How do you buffer a terabyte stream arriving over a network port (thus not backwards seekable) on a machine with 4GB to 8GB of memory? DO/DONE You wrote: I dont see the do / done problem. One looks for "d" "o" (optional space) > some other token, the other "d" "o" "n" "e", there is no ambiguity.. The > "do" parser should clearly reject "don...". We're doing our token matching with simple comparison on the leading string. In order to realize that the input 'd', 'o', 'n', 'e' should not satisfy "do", we need to know one of two things: 1. The list of all tokens, by which we might come to know that there is a potential longer match, or 2. The list of potential token separator/terminator characters, by which we would know that the character after 'o' must be consumed under the maximal munch rule. The reason it's a problem is that I don't see either of those requirements satisfied. There's no tokenizer state, so we don't know the token terminator characters. Meanwhile, the "do" and "done" matching rules are worlds apart with no obvious way to combine them into a single regexp. Oh. I see how GLSL parser did it. The keyword combinator checks that the matched string is not trailed by a keyword continuation character. Is that the short answer? I also see that this disambiguates keywords from other identifiers **provided** we match the keywords first. VIRTUAL TOKENIZATION You wrote: I use virtual tokenization, which is a parser-combinator. IE I take a > parser description in combinators, and convert it into a tokenizer. It does > both in one pass by effectively using lazy tokenization. I'm probably missing something perfectly obvious, but I don't see how this is done. I understand that if you can gather the tokens together into a list of regexps you can generate a tokenizer. I also understand that the tokenizer can be lazily turned into optimized code. What I don't see is how to gather the regexps. At the time the various constructs like (keyword "if") appear, they are appearing in completely disconnected [sub]parsers. Later those [sub]parsers are joined by connecting up the resulting functions. At that point you can no longer walk the parsers to locate the contained keyword matchers. I can certainly see how to do it if the type of "parser" is not (stream, state) -> result. That is: if the return value of a combinator is not of function type. So: how are you able to extract the various keyword and similar regexps for merging? shap
_______________________________________________ bitc-dev mailing list bitc-dev@coyotos.org http://www.coyotos.org/mailman/listinfo/bitc-dev