On Apr 30, 2010, at 4:27 PM, Graham Wideman wrote:
> This prompts me to wonder how debuggable these lexers will be? Currently a
> certain amount of troubleshooting of lexing/parsing can be done by inspecting
> the generated lexer source, single-stepping it and so on.
>
> If you move to encoding the lexer logic in bytecodes, does the generated
> lexer source become an inscrutable black box? Or is there still meaningful
> source code to examine, trace etc?
Yup. The bytecode is actually easier to read than the java ;)
lexer grammar L2;
A : 'ab';
B : 'a'..'z'+ ;
I : '0'..'9'+ ;
yields:
0000: split 9, 16, 29 // says 3 paths are possible
0009: match8 'a'
0011: match8 'b'
0013: accept 4
0016: range8 'a', 'z'
0019: split 16, 26
0026: accept 5
0029: range8 '0', '9'
0032: split 29, 39 // go back or fall out of loop into accept state
0039: accept 6
is that what you mean? It's 1-to-1 with the grammar. taken almost verbatim
from Russ Cox's description of VM-based NFA simulation.
ANTLR v4 uses 42 bytes to encode entire L2 grammar. ANTLR v3 generates 246
lines of Java and 2709 bytes of java .class file:
/tmp $ wc -l L2.java
246 L2.java
/tmp $ ls -l L2.class
-rw-r--r-- 1 parrt wheel 2709 Apr 30 16:39 L2.class
Ter
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en.