No - this is the wrong technique as what happens is that the lexer is simpler 
but still rejects malformed identifiers in the wrong way. You have to look for 
a valid start character, then consume until something MUST be something other 
than an identifier character. What you are looking to do is interpolate an 
indentifier that has invalid characters, then issue "Identifiers cannot contain 
character 'xxxx'" etc. The trick is to not match characters that are 
identifiers but stop on characters that definitely cannot be. There is a subset 
that reduces the error margins considerably. Otherwise you throw lexical errors 
and bunches of unrelated errors.


Jim

> -----Original Message-----
> From: [email protected] [mailto:antlr-interest-
> [email protected]] On Behalf Of David-Sarah Hopwood
> Sent: Wednesday, December 09, 2009 10:09 PM
> To: [email protected]
> Subject: Re: [antlr-interest] Lexer and Java keywords
> 
> Jim Idle wrote:
> > The issue is that your lexer is too complicated for the standard
> timeout on analysis values. Use:
> >
> > -Xconversiontimeout=32000
> >
> > And it will generate just fine.
> [...]
> 
> This is probably due to listing the character ranges for JavaLetter and
> JavaLetterOrDigit explicitly. Using the technique below (based on code
> from the ECMAScript 3 grammar by Patrick Hulsmeijer) will probably
> allow the lexer to be small enough to generate with the default
> timeout. Note that you'll have to adjust this for any differences
> between the identifier syntax language you're trying to parse, and that
> of Java -- I notice that you had '\u0000'..'\u0008' |
> '\u000e'..'\u001b' in JavaLetterOrDigit, for example.
> 
> 
> fragment IdentifierStartASCII
>   : 'a'..'z'
>   | 'A'..'Z'
>   | '$'
>   | '_'
>   ;
> 
> fragment IdentifierPart
>   : IdentifierStartASCII
>   | '0'..'9'
>   | { Character.isJavaIdentifierPart(input.LA(1)) }?
>       { matchAny(); }
>   ;
> 
> // This generates mIdentifierRest() used below.
> fragment IdentifierRest
>   : IdentifierPart*
>   ;
> 
> IDENTIFIER
>   : IdentifierStartASCII IdentifierRest
>   | { if (!Character.isJavaIdentifierStart(input.LA(1))) {
>         throw new NoViableAltException("identifier start", 0, 0,
> input);
>       }
>       matchAny(); mIdentifierRest(); }
>   ;
> 
> --
> David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com





List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--

You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.


Reply via email to