Hi, One way to create island grammars in existing ANTLR is to use delimiters. In other words your language would be much easier to parse if it took input like this:
<command> [-timeout <NN>] [-notify <addr>] "shell_command" Your outside language constructs are now safely tucked away in quotes. Now your lexer doesn't need to see them in any meaningful way. When you encounter a quoted string you can produce a STRING token and let higher level code deal with what it really means. Cheers ./m Daniels, Troy (US SSA) wrote: > > >> The basic issue seems to be that I want this basic form: >> >> <command> [-timeout <NN>] [-notify <email_address>] >> >> examples of which are: >> >> cleanlogs -timeout 20 -notify [email protected] >> cleanup -timeout 10 -notify "[email protected] [email protected]" >> deploy -notify [email protected] -list "compA compB compC" >> >> etc., along with the less-structured shell command types: >> >> // with timeout >> shell -timeout 20 find /x/web -name '*.logs.bak' | xargs rm -f >> >> // without timeout >> shell find /x/web -name '@*' | xargs mv /tmp/ >> > > I think this is the main problem that you need to resolve. The basic form is > a highly structured, simple language, that can easily be handled with a small > grammar. The shell command is a complex language that could potentially > match valid tokens in your simple language. (It's generally not illegal to > have a shell command called "-notify", just a bad idea. But some user will > do it anyway.) > > I think what you want to do is look at island grammars. These are typically > used when you have two different languages with very different structure in > the same input. (A common example is parsing javadoc comments within a java > file.) You also have a clean entry and exit point for the island grammar. > The lexer normally parses the basic form. When the lexer encounters "shell", > it switches to the island grammar to parse the remainder of the line, then > switches back to the basic form for the next line. This allows you to have a > grammar which consumes the rest of the line regardless of content without the > need to avoid conflicts with the basic form. > > I think either 3.3 or 4 will have better support for this. > >> The fact that I want an unquoted email address to be parsed >> (i.e., [email protected] and not '[email protected]') seems to be causing >> the problem. >> >> I'm going to try to redo things a bit more cleanly, try to >> boil down the problem further, and repost if I still have problems. >> > > If you try to keep everything in one grammar, I suspect you will continually > have problems like this arise. If you fix the unquoted email, you might > uncover another problem or your next change will introduce a similar problem. > > Troy > >> Thanks for the help. >> >> >> Bill >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
