The more general approach is to just broadly characterize key characters 
(DOT) and character strings (UPPER_WORD, LOWER_WORD, WORD) in the lexer 
and use the parser to create a well structured AST.  Don't do much if 
any analysis in the parser.  You then use multiple tree pattern matchers 
to identify key tokens and token sequences in context - each 
tree-pattern matcher implementing a discrete analysis rule or closely 
related set of rules.  Makes the system easily adaptable to changes in 
the keyword set and the recognition contexts.

On 2/7/2010 12:00 PM, James Crowley wrote:
> Hi Gerald,
>
> Thanks so much for that. What about the scenario where we don't know 
> what the keywords were specifically - just the format they appear in 
> (ie to group just that something upper case with a period in the 
> middle)... whilst still retaining other behaviours around periods if 
> they appear elsewhere? Is this then getting too difficult within the 
> constrains of what context-free grammars can do?
>
> Many thanks for your help
>
> James
>
> On 6 February 2010 06:22, Gerald Rosenberg <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     While it may be heresy in the world of context-free grammars,
>     Antlr actually performs quite nicely for many NLP problems.
>
>     The illustrated approach works well for explicitly identifying a
>     few key words in context.  Just have to watch for the lexer
>     functionally being k=1 and remember that the lexer rules apply
>     top-down.
>
>     There is a filter option if all you want to do is just find keywords.
>
>
>     On 2/5/2010 4:45 PM, James Crowley wrote:
>
>         Hi Michael,
>
>         Thanks for the response. Sadly not - the language is English
>         ;-) But just
>         hoping to do some basic tokenization of paragraphs of text
>         (essentially just
>         extracting keywords) - thought it would be faster/easier to
>         use a tool like
>         ANTLR than using regex or attempting to roll my own. Am I
>         being foolish for
>         even attempting this?
>
>         James
>
>         On 5 February 2010 21:29, Michael
>         Matera<[email protected] <mailto:[email protected]>>
>          wrote:
>
>
>
>


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to