cool. good info. i had not considered bytecode approach; would make retargeting trivial. Ter On Feb 22, 2010, at 2:00 PM, Ron Burk wrote:
>> That is interesting...can the interpreter fetch/exec loop fit in L1 cache? >> maybe... > > Things always get more complicated by the time you have a working product, > but I had these wild-ass guesses in my notes: > > Back of envelope calculation: 2KB for basic parser framework, > suppose average of 3 rules per non-terminal, average of 5 > non-terminal lookups per non-terminal (each with an associated > 2-byte "address" to "call"), average of 5 symbols > per rule, summing to 3 * (10*2 + 5*2) = 90 bytes per non-terminal. > Let's round that up to 100 bytes per non-terminal to allow for > action/reduce opcodes, tail recursion optimization opcodes, etc. > > Say about 60 non-terminals for ANSI C parser. That would come > to only 6KB+2KB=8KB! Leaves a fair amount of room for user actions, > parser stack, value stack, etc. especially if you're just constructing > a parse tree. Even double that to be more pessimistic and 16KB to > parse ANSI C sounds pretty small, leaving 3/4 of L1 cache on many > modern Intel processors (and of course, if the entire parser fits in > 16KB, something less than 100% of it will be "hot" enough to be > regularly demanded in cache; e.g., "struct" tends to be needed > during the first portion of a C source file, then later maybe not > at all). > > I do intend to make each opcode do as much as possible, and the fact > that I can boil a large expression grammar down to just a few rules helps > as well. For C code generation, for example, one opcode would scan > the list of eligible next terminals (simple list scans instead of sparse table > lookups or compressed table schemes). For Java code generation, > it appears likely just as well to replace the list-scanning opcode with > one switch statement per non-terminal, given the lack of static array > initialization and the fact that two of the highest-level Java VM opcodes > are devoted to switch statement support. > >> i was thinking SLR(0) also instead of the prec climbing >> thing for expressions. I'll think about this hard. > > So far, I have not found it objectionable to require no null rules > in the expression part of the grammar; you can always wrap > that sub-grammar with another nonterminal that is nullable. > I also have not found any popular language expression construct > whose ambiguous SLR(0) automaton can't correctly be > disambiguated by the classic operator precedence table algorithm, > at least on my whiteboard. I guess the comma operator in C > requires a little massaging because it's ambiguous with function > parameter separation, but that's true in an LALR grammar for > ANSI C as well. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
