On Apr 30, 2010, at 4:27 PM, Graham Wideman wrote:
> This prompts me to wonder how debuggable these lexers will be?  Currently a 
> certain amount of troubleshooting of lexing/parsing can be done by inspecting 
> the generated lexer source, single-stepping it and so on.
> 
> If you move to encoding the lexer logic in bytecodes, does the generated 
> lexer source become an inscrutable black box?  Or is there still meaningful 
> source code to examine, trace etc?

Yup. The bytecode is actually easier to read than the java ;)

lexer grammar L2;
A : 'ab';
B : 'a'..'z'+ ;
I : '0'..'9'+ ;

yields:

0000:   split         9, 16, 29   // says 3 paths are possible
0009:   match8        'a'
0011:   match8        'b'
0013:   accept        4
0016:   range8        'a', 'z'
0019:   split         16, 26
0026:   accept        5
0029:   range8        '0', '9'
0032:   split         29, 39 // go back or fall out of loop into accept state
0039:   accept        6

is that what you mean?  It's 1-to-1 with the grammar. taken almost verbatim 
from Russ Cox's description of VM-based NFA simulation.

ANTLR v4 uses 42 bytes to encode entire L2 grammar.   ANTLR v3 generates 246 
lines of Java and 2709 bytes of java .class file:

/tmp $ wc -l L2.java
     246 L2.java
/tmp $ ls -l L2.class
-rw-r--r--  1 parrt  wheel  2709 Apr 30 16:39 L2.class

Ter

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to