My first attempt was to use regexps in perl, but the error messages don't have enough structure, (and I was too lazy to write all those regexps..)
Since I've had a very good experience with antlr some years ago, I thought I'll use that, then I'll get the ast for free ;) I have refactored my grammar and am currently in the process of eliminating prefixes in the tokens. (by hand .. is there a tool for that??) Never heard of jflex, sound interesting, though *read* ;) Thanks Fabian On 11/13/2010 03:55 PM, Joachim Schrod wrote: > Jim Idle wrote: > >> From: [email protected] [mailto:antlr-interest- >> [email protected]] On Behalf Of Joachim Schrod >>> >>> Fabian Haupt wrote: >>>> >>>> I'm getting a NoViableAltException: line 1:55 no viable alternative >>> at >>>> input '.[CheckIntegrity' >>>> >>>> with the input of >>>> 'The lower level block specifies a right link block of 0. >>>> [CheckIntegrity+343^%SYS.DATABASE:%SYS]' >>>> starting with the 'test' rule. >>>> >>>> >>>> this is the grammar: >>>> ---------------------------- >>>> grammar integrit; >>>> >>>> options { >>>> language= Java; >>>> } >>>> >>>> >>>> test:'The lower level block specifies a right link block of '+INT+'.' >>>> WS debugnote NEWLINE; >>>> >>>> firstNodePtrWrong: INT+'. We were expecting it to point to '; >>>> >>>> >>>> debugnote:'['+ID+'+'+INT+'^%SYS.DATABASE:%SYS'+']'; >>>> >>>> ID : ('a'..'z'|'A'..'Z')+ ; >>>> INT : '0'..'9'+ ; >>>> NEWLINE:('\r'? '\n'); >>>> WS : (' '|'\t')+ {skip();} ; >>> >>> Without running it -- >>> You demand a WS in the test rule that will never appear there as you >>> skip that token. Don't you mean NEWLINE there? >> >> This is correct, you are asking for WS that will be skipped, but also, hard >> coding the specific message is so long will get you in to trouble I think. >> You are probably better off with awk for something like this I think. > > If Fabian's error messages are basically just regexps, probably. > But if the information structure is context free, ANTLR is a very > good bet. > > I'm currently using ANTLR to extract data from invoice texts in PDF > files and to assure that I detect all information and that the > expected invoice structure is complete and hasn't changed. This is > very hard to do with regexps, but has the same base problem as > Fabian: Lots of long constant text strings that serve basically as > keywords and mark `places' in that PDF document. ANTLR is a great > tool to accomplish structure checking, with it's parser generation > facilities. If the lexical input is highly unstructured, it's a bit > of a pain in the back, but can be handled. (I ran into similar > lexer problem as Fabian. Eventually, I resolved them by using > jflex. ;-)) > > Cheers, > Joachim > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
