> Mike Samuel wrote: > > I'd like to derive a state machine that recognizes a combined lexical > > grammar of JS/HTML/CSS (hacked so that JS has a regular lexical > > grammar) and a mapping from states to the production they're part of. > > I need to keep the number of states small. > > > > I saw something tantalizing about "dfaState() String Template" at > > http://www.antlr.org/wiki/display/ANTLR3/How+to+build+an+ANTLR+code+generation+target > > but am still unsure how to proceed. > > > > Is this possible with ANTLR, and if so, does anyone know of existing > > code I could adapt? > > > > For background, my end goal is to bolt string interpolation onto > > javascript, but in a way that doesn't introduce XSS problems as > > described at > > http://google-caja.googlecode.com/svn/changes/mikesamuel/string-interpolation-29-Jan-2008/trunk/src/js/com/google/caja/interp/index.html
David-Sarah Hopwood wrote: > Sounds like a fun project. > ANTLR normally only generates explicit DFAs when there is a nontrivial > decision between alternatives of a rule. In the case of the ECMAScript 5 > lexical grammar I'm using, it only generates two DFAs, one for the main > tokens rule, and one for DecimalLiteral. > You can get ANTLRWorks to show a state machine diagram for any particular > decision, or for the tokens rule, but they're not particularly > enlightening. In principle, ANTLR computes all the information needed to > produce a pure FSM lexer, but I don't think it provides any way to actually > generate one, at least not without a lot of work. If that's definitely > what you need, ANTLR is probably not the right tool. Thanks for the response, David-Sarah. Fancy running into you here. I guess you're using ANTLR for Jacaranda then? How do you get a lexical grammar to distinguish division operators/regex literal using ES5 rules, or are you using the lexical grammar from Waldemar's JS 1.9 then? I poked about a bit more, and when I generate a lexer, I get code that looks like the below which sees to create instances of org.antlr.runtime.DFA as described at http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1_d_f_a.html . I think I can probably poke at those to produce both (1) an alphabet reduction, and (2) a transition table. It won't integrate nicely into a build from grammar, and it's hacky, but if the DFA just isn't exposed to code-generation targets, then it'll be easier than trying to engineer a way through the front door. ====== static final short[] DFA19_eot = DFA.unpackEncodedString(DFA19_eotS); // more unpacking elided static { int numStates = DFA19_transitionS.length; DFA19_transition = new short[numStates][]; for (int i=0; i<numStates; i++) { DFA19_transition[i] = DFA.unpackEncodedString(DFA19_transitionS[i]); } } class DFA19 extends DFA { public DFA19(BaseRecognizer recognizer) { this.recognizer = recognizer; this.decisionNumber = 19; this.eot = DFA19_eot; // more grabbing of statics elided } public String getDescription() { return /* large chunks of my grammar elided */; } } List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
