Hi Michael, Thanks for the response. Sadly not - the language is English ;-) But just hoping to do some basic tokenization of paragraphs of text (essentially just extracting keywords) - thought it would be faster/easier to use a tool like ANTLR than using regex or attempting to roll my own. Am I being foolish for even attempting this?
James On 5 February 2010 21:29, Michael Matera <[email protected]> wrote: > Hi James, > > I don't think this grammar is that simple. This is not a context-free > grammar: The meaning of '.' depends on what follows it. In other words > when the Lexer looks at the dot in '.NET' you expect a KEYWORD production, > but when it sees the dot in 'work.' you expect no token. This is a problem. > Can you redesign this language? > > Cheers > ./m > > James Crowley wrote: > >> hey guys, >> >> I've got a really simple grammar that I'm trying to get working, but >> failing >> miserably at the moment. Would really appreciate some pointers on this... >> >> root : (keyword|ignore)*; >> keyword : KEYWORD; >> ignore : IGNORE; >> >> KEYWORD : ABBRV|WORD; >> fragment WORD : ALPHA+; >> fragment ALPHA : 'a'..'z'|'A'..'Z'; >> fragment ABBRV : WORD?('.'WORD); >> >> IGNORE : .{ Skip(); }; >> >> With the following test input: >> >> "some ASP.NET and .NET stuff. that work." >> >> I'm wanting a tree that is just a list of keyword nodes, >> >> "some", "ASP.NET", "and", ".NET", "stuff", "that", "work" >> >> At the moment I get >> >> "some", "ASP.NET", "and", ".NET", "stuff. that", >> >> (for some reason "." appears within the last keyword, and it misses "work" >> >> If I change the ABBRV clause to >> >> fragment ABBRV : ('.'WORD); >> >> then that works fine, but I get keyword (asp) and keyword (.net) - >> seperately - but I need them as a single token. Any help you can give >> would >> be much appreciated. >> >> Many thanks >> >> James >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> >> > This email and any attachments are intended for the sole use of the named > recipient(s) and contain(s) confidential information that may be > proprietary, privileged or copyrighted under applicable law. If you are not > the intended recipient, do not read, copy, or forward this email message or > any attachments. Delete this email message and any attachments immediately. > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
