The more general approach is to just broadly characterize key characters (DOT) and character strings (UPPER_WORD, LOWER_WORD, WORD) in the lexer and use the parser to create a well structured AST. Don't do much if any analysis in the parser. You then use multiple tree pattern matchers to identify key tokens and token sequences in context - each tree-pattern matcher implementing a discrete analysis rule or closely related set of rules. Makes the system easily adaptable to changes in the keyword set and the recognition contexts.
On 2/7/2010 12:00 PM, James Crowley wrote: > Hi Gerald, > > Thanks so much for that. What about the scenario where we don't know > what the keywords were specifically - just the format they appear in > (ie to group just that something upper case with a period in the > middle)... whilst still retaining other behaviours around periods if > they appear elsewhere? Is this then getting too difficult within the > constrains of what context-free grammars can do? > > Many thanks for your help > > James > > On 6 February 2010 06:22, Gerald Rosenberg <[email protected] > <mailto:[email protected]>> wrote: > > While it may be heresy in the world of context-free grammars, > Antlr actually performs quite nicely for many NLP problems. > > The illustrated approach works well for explicitly identifying a > few key words in context. Just have to watch for the lexer > functionally being k=1 and remember that the lexer rules apply > top-down. > > There is a filter option if all you want to do is just find keywords. > > > On 2/5/2010 4:45 PM, James Crowley wrote: > > Hi Michael, > > Thanks for the response. Sadly not - the language is English > ;-) But just > hoping to do some basic tokenization of paragraphs of text > (essentially just > extracting keywords) - thought it would be faster/easier to > use a tool like > ANTLR than using regex or attempting to roll my own. Am I > being foolish for > even attempting this? > > James > > On 5 February 2010 21:29, Michael > Matera<[email protected] <mailto:[email protected]>> > wrote: > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
