The size of the DFA tables is known about, but despite cache misses, you won't notice the effect unless you have huge inputs, in which case there will be plenty of other cache misses anyway. However, when you end up with a large DFA set, it usually (not always) means that the lexer can be specified 'better' (in terms of allowing ANTLR to deal with it better).
Can you post your grammar so we can see it? Look for things like: C : A?A; Which could be C : A A? ; And specifying too much restriction in the character sets for tokens because it says that in the spec (you want to accept the maximum range that is not ambiguous, then issue a semantic error, in keeping with pushing errors as high up the tool chain as you can because you tend to have more context and better errors that way. To produce the range structure would require changes in the way the DFA is given to the code generating templates, which I have no control of. However, in v4, there is an improved lexer arrangement, which will avoid these huge tables, and the code generator is more sophisticated in terms of what an individual target can do before it generates code via the templates (right now, there is not much of anything that a target can do other than use the templates). Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Andreas Jonsson > Sent: Thursday, August 05, 2010 1:54 AM > To: [email protected] > Subject: [antlr-interest] Compression of dfa tables > > Hi, > > I have just started to write a parser using antlr3 with C as target. > The lexer is growing quickly in size due to the dfa tables. With 61 tables the > size of the lexer is over 4MB. My feeling is that the consequence will be that > performance will suffer due to a large number of cache misses. > > But the tables contains 65536 entries, where almost all elements have the > same value. > > Someone might have thought of this idea before and maybe it have been > rejected for good reasons, but why not use a sorted set of ranges instead of > a flat table? For instance: > > struct DFA_RANGE { > ANTLR3_UINT16 start; > ANTLR3_UINT16 end; > ANTLR3_UINT32 state; > }; > > struct DFA_TABLE { > ANTLR3_UINT32 length; > struct DFA_RANGE ranges[]; > }; > > And you would get a very compact data structure: > > static const struct DFA_TABLE dfa6_T2 { 5, {{0, 34, 20}, {35, 35, 28}, {36, 60, 20}, > {61, 61, -1}, {62, 65535, 20} }}; > > Maybe there are some cases where such a data structure will be larger than > an ordinary table? In that case you can, of course, add the possibility to fall > back to an ordinary table. > > Best regards, > > Andreas Jonsson > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
