Greetings! On Sun, 2010-01-10 at 10:04 +0800, Michael Richter wrote: > 2010/1/9 Kay Röpke <[email protected]> > > > > > On Jan 9, 2010, at 5:32 AM, Michael Richter wrote: > > > > > I keep coming across a pattern in a grammar I'm working on. This pattern > > > looks something like this: > > > > > > - A production can be *A*. > > > - A production can be *B*. > > > - A production can be *A B.* > > > > > > In the grammar I'm transcribing this from, the notation used is *(A & > > B)*. > > > Is there some convenient way to code that in ANTLR's EBNF notation? I > > keep > > > having to do *(A | B | A B)*. As is that isn't all that onerous as-is, I > > > admit, but imagine if A is five tokens long and B is also five tokens > > long > > > and then imagine this kind of pattern happening about twenty times in the > > > grammar. Is there a way to concisely do this? > > > > What is the restriction on the parts of the production? > > I.e. what differentiates a valid production from an invalid one? > > > > The restriction is exactly as I put it: You can have A (where A is a > multi-token set of specified order), B (where B is a multi-token set of > specified order) or A B. It *must* be in the order provided and A and B are > fixed token sets. >
1) make a parser rule to recognize the sequence of Tokens (and/or other parser rules) comprising A; and call it, say, as: recognize_A. 2) make a parser rule to recognize the sequence of Tokens(and/or other parser rules) comprising B; and call it, say, as: recognize_B. 3) make a parser rule of the form: an_A_or_B_or_AB : recognize_A ( recognize_B )? | recognize_B ; observe the proper left-factoring in the above... 4) use the above parser rule `an_A_or_B_or_AB` from 3) everywhere you have the (A|B|A B) stuff. note that if A and B share a common prefix (e.g. a common left-factor) you will probably experience issues with the above 4 steps. > Think of it this way: you're declaring a variable. You have a token for the > variable, then an optional type specification (A -- multiple tokens) and an > optional initializer (B -- multiple tokens). Both parts are optional, but > you *must* have at least one and the declarations *must* be in the order of > type then initializer if both are present. The only way I've found to do it > is (A | B | A B), but this is painful when A and B are more than one token > in length and I've got about 20 of these things in the grammar. This is > just begging for typos. this example REALLY FAILS for me. It is hard for me to envision a language the can initialize a variable (e.g. B) without any declaration of that variable (e.g. A). So having a bare naked B under the above example makes no sense to me. Maybe you meant something like: (A B? C?) where A is the var decl, B is its type and C is its initial value... Hope this helps.... -jbb List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
