[il-antlr-interest: 27806] Re: [antlr-interest] Real simple grammar - newbie help?!

James Crowley Fri, 05 Feb 2010 16:46:15 -0800

Hi Michael,

Thanks for the response. Sadly not - the language is English ;-) But just
hoping to do some basic tokenization of paragraphs of text (essentially just
extracting keywords) - thought it would be faster/easier to use a tool like
ANTLR than using regex or attempting to roll my own. Am I being foolish for
even attempting this?


James

On 5 February 2010 21:29, Michael Matera <[email protected]> wrote:

> Hi James,
>
> I don't think this grammar is that simple.  This is not a context-free
> grammar:  The meaning of '.' depends on what follows it.  In other words
> when the Lexer looks at the dot in '.NET' you expect a KEYWORD production,
> but when it sees the dot in 'work.' you expect no token. This is a problem.
>  Can you redesign this language?
>
> Cheers
> ./m
>
> James Crowley wrote:
>
>> hey guys,
>>
>> I've got a really simple grammar that I'm trying to get working, but
>> failing
>> miserably at the moment. Would really appreciate some pointers on this...
>>
>> root : (keyword|ignore)*;
>> keyword : KEYWORD;
>> ignore : IGNORE;
>>
>> KEYWORD : ABBRV|WORD;
>> fragment WORD : ALPHA+;
>> fragment ALPHA : 'a'..'z'|'A'..'Z';
>> fragment ABBRV : WORD?('.'WORD);
>>
>> IGNORE : .{ Skip(); };
>>
>> With the following test input:
>>
>> "some ASP.NET and .NET stuff. that work."
>>
>> I'm wanting a tree that is just a list of keyword nodes,
>>
>> "some", "ASP.NET", "and", ".NET", "stuff", "that", "work"
>>
>> At the moment I get
>>
>> "some", "ASP.NET", "and", ".NET", "stuff. that",
>>
>> (for some reason "." appears within the last keyword, and it misses "work"
>>
>> If I change the ABBRV clause to
>>
>> fragment ABBRV : ('.'WORD);
>>
>> then that works fine, but I get keyword (asp) and keyword (.net) -
>> seperately - but I need them as a single token. Any help you can give
>> would
>> be much appreciated.
>>
>> Many thanks
>>
>> James
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
> This email and any attachments are intended for the sole use of the named
> recipient(s) and contain(s) confidential information that may be
> proprietary, privileged or copyrighted under applicable law. If you are not
> the intended recipient, do not read, copy, or forward this email message or
> any attachments. Delete this email message and any attachments immediately.
>
>
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 27806] Re: [antlr-interest] Real simple grammar - newbie help?!

Reply via email to