Thanks Cliff, that is very helpful. I currently don't have the time to learn the guts of the ANTLR C port in order to write that. Are there any examples of this for C/C++?
-Brian -----Original Message----- From: Cliff Hudson [mailto:[email protected]] Sent: Friday, May 07, 2010 15:58 To: [email protected]; [email protected] Subject: RE: [antlr-interest] Is parser control over the lexer possible? It's not at all difficult to let ANTLR handle indeterminate amounts of input. As has been suggested before, you will need to provide your own token stream to provide tokens at the rate you wish to give them. The problem is one of interfacing the pull model of the parser with the push model of your input. Lets assume you have a parser which runs indefinitely so long as you keep feeding it tokens, until you feed it the EOF token. For example: command : expr+; So long as the parser can receive tokens which match the grammar for expr, a call to command() will not return. In your case, you may very well have an infinite number of tokens, but you aren't wanting to provide them up-front. The solution is a token stream which provides tokens in batches you choose. You can do this by letting ANTLR control the input loop, as follows: Create your custom token stream. Pass it to the parser. Execute command(); In your token stream's implementation of NextToken() you will either: 1) If you have no stored tokens, ask the "user" for additional input. If you are reading from a file, this might consist of reading in a line and sending it to the lexer, which will spit out a bunch of tokens. In an interactive interpreter sort of scenario, you might prompt the user for a line of text. Store these tokens in some ordered collection, and return the first one; or 2) If you still have stored tokens, return the next one. In this way, ANTLR will parse only as much as you have to give at any time, and will wait for you to provide it more tokens if you don't have any at the moment. When you are finished with your session, you can pass EOF back from NextToken() which will tell the parser that there is no more input to be had. The call to command() will return. Note that as long as command() hasn't returned, ANTLR will be able to process rules and execute grammar actions when you return tokens to it from NextToken(). Does that make sense? On the subject of the parser controlling how the lexer works, this is not something that ANTLR is designed to do "out of the box". However, you control the token stream, and you control what happens when certain actions are invoked. You could construct a system whereby when an action is executed, it changes the lexer used by your NextToken() call to fetch additional tokens. Remember, you are utterly in control of which tokens are provided to the parser and - as illustrated by the technique above - when they are provided. You control which lexer you use and when. But I would caution you that such a system can become fairly complex - for instance you have to ensure that token IDs don't conflict between your lexers, which may require mapping them by hand in your NextToken call and altering the token IDs used by the generated parser itself. I've not tried this before, though I'd be a bit surprised if it couldn't be done. I personally have a system where there are two grammars and two parsers. The 'outer' grammar knows about areas where the inner grammar would be used. Those areas are lexed as a single token (quoted string in this case). When the rule in the outer grammar matches the quoted string, it uses a lexer and parser for the 'inner' grammar to process it. Hope this has helped. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Brian Catlin Sent: Thursday, May 06, 2010 7:37 PM To: [email protected] Subject: Re: [antlr-interest] Is parser control over the lexer possible? I too would be interested in changing that behavior. I use ANTLR as a command parser, so instantiating everything for every new command is a lot of overhead. I think ANTLR needs a mode in where it only fetches one line at a time, by calling a GetInput routine that we supply -Brian -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mike Matera Sent: Friday, May 07, 2010 10:13 To: Chris verBurg Cc: [email protected] Subject: Re: [antlr-interest] Is parser control over the lexer possible? Hi Chris, Yes, antlr reads the whole file into memory. I don't know how to stop it from doing that. Cheers ./m Chris verBurg wrote: > Hey all, > > OK, let me try a related but far less involved question: > > ANTLR tokenizes all input into an internal list before parsing > anything in that list. (Right?) Hence, it runs out of memory trying > to read my 6.2-million-line input file, because that list is huge. > What's the ANTLR way to handle such large input streams? > > Thanks, > -Chris > > > > > On Thu, Apr 29, 2010 at 4:33 PM, Chris verBurg <[email protected]>wrote: > > >> Hey guys, >> >> A question was posted a few days ago about dealing with an infinite >> input stream, and the suggestion was to subclass TokenStream so that >> it didn't read in all of the input upfront. >> >> I'm running into a similar problem, but before I go run off and >> subclass things I thought I'd see if there's a "best practice" for my >> situation. It also overlaps with the "how do I use keywords as identifiers<http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741>" >> FAQ. >> >> I have a data-file grammar that recognizes strings, numbers, and a >> ton of keywords. Pretending "VERSION" and "LIMIT" are two keywords, >> here's (part >> of) the .g file: >> >> data_file: >> 'VERSION' STRING ';' >> | 'LIMIT' NUMBER ';' >> ; >> >> NUMBER: >> ('-'|'+')? ('0'..'9')+ >> | ('-'|'+')? ('0'..'9')* '.' ('0'..'9')* >> ; >> >> STRING: >> ('a'..'z' | 'A'..'Z' | '_' | '.' | '0'..'9')+ ; >> >> >> Problem input #1: >> >> VERSION 1.2 ; >> >> The "1.2" is lexed as a number instead of a string, so I get a parse error. >> >> Problem input #2: >> >> VERSION LIMIT ; >> >> The "LIMIT" is lexed as a keyword instead of a string, so I get a >> parse error. >> >> >> I saw the FAQ about keywords-as-identifiers, but I don't think it's >> helpful for me. For the NUMBER-that-should-be-a-STRING problem, >> there's no exact string I could pass to >> input.LT(1).getText().equals(), because it requires a regex to match >> a NUMBER. The other solution was to make an "identifier" >> rule to match all possibilities -- is the best solution here really >> to change the rule to 'VERSION' (STRING | NUMBER) ';'? >> >> For the keyword-that-should-be-a-STRING problem, I'm hesitant to use >> either of those solutions because of the sheer number of keywords in >> this grammar. >> >> >> Ideally what I'd like to do is what I did in Flex and Bison (which >> I'm porting this grammar from). What I did there was have the parser >> control how the lexer interpreted subsequent tokens. I embedded a >> rule in the parser, immediately after the 'VERSION' token, to tell >> Flex to enter a "force-the-next-token-to-be-a-STRING-no-matter-what" >> start state. It worked beautifully. I got most of the way through >> implementing that in my ANTLR grammar when I found out that >> ANTLRFileStream reads all the tokens in before the parser even starts >> up -- which means the parser can't give the lexer any direction over token interpretation. >> >> >> Thoughts, suggestions, outrageous flames? Is there a "good" way to >> do this, or maybe is there a completely different approach I should take? >> >> Thanks! >> -Chris >> >> >> >> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
