Hi, I cannot reproduce this using your supplied grammar: as long as the required NEWLINE is in place, your example works just fine. If, however, I do not provide a newline in the input, I'm hit by a NoViableAltException.
I.e. for input "Comment: NOCHandle John Q. Hacker" I get the result you describe, while input "Comment: NOCHandle John Q. Hacker " works perfectly, which seems reasonable. Same result for NCHandle. This, of course, if starting from rule asline. Am I missing something? Cheers, Pop On Fri, Jan 28, 2011 at 8:51 AM, Robert J. Hansen <[email protected]>wrote: > I haven't done any work with lexers and parsers in many years, and > figured a good way to go about getting re-acquainted would be to find a > big corpus of text and put together a translator. The corpus I had > around was the ARIN WHOIS information, which is basically key-value > coding in a record-based format. Newlines are significant, but other > whitespace generally isn't. > > I'm now running into a brick wall, though, with trying to enable greedy > matching -- scarfing up everything to end-of-line and returning that > back as a string. I can *almost* do it, but I'm getting killed on some > corner cases. > > The following is an abbreviated version of the grammar. The bug is > present in this, but all actions, etc., have been omitted. > > ===== > grammar foo; > > file : (block|NEWLINE)*; > block : asblock > | netblock; > asblock : asbegin asline* NEWLINE; > netblock: netbegin netline* NEWLINE; > netline : n_nh; > netbegin: 'NetHandle:' words; > n_nh : 'NOCHandle:' words; > asline : 'Comment:' words; > asbegin : 'ASHandle:' words; > words : word (word)* NEWLINE > | NEWLINE; > word : WORD; > WORD : ~(' '|'\t'|'\r'|'\n')+; > NEWLINE : '\r'?'\n'; > WS : (' '|'\t') { skip(); }; > ===== > > ... Now, consider the derivation of the line: > > Comment: NOCHandle John Q. Hacker > > ... starting from rule asline. asline derives out to 'Comment:' on the > left, words on the right, and from there straight to NoViableAltException. > > However, if I change it to: > > Comment: NCHandle John Q. Hacker > > ... then it derives successfully. > > It appears that when trying to derive the words rule, it sees that rule > n_nh could also apply and can't decide what derivation to use. But why? > n_nh is not listed as a child rule of words. How can I fix this so > that the words rule will grab *everything* to the end of the line? > > My second concern: when trying to parse a multi-gig file using a grammar > much like the above, Java demands it be given absurdly huge heap sizes. > I am assuming that like most compilers ANTLR has to construct the > entire tree in memory before it can walk the tree doing various actions: > however, if there's some way to mitigate the heap memory problem, I > would be deeply appreciative. > > Thank you all for your help! > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
