On Thu, Dec 10, 2009 at 8:59 AM, Jim Idle <[email protected]> wrote:
> No - this is the wrong technique as what happens is that the lexer is simpler 
> but still rejects malformed identifiers in the wrong way. You have to look 
> for a valid start character, then consume until something MUST be something 
> other than an identifier character. What you are looking to do is interpolate 
> an indentifier that has invalid characters, then issue "Identifiers cannot 
> contain character 'xxxx'" etc. The trick is to not match characters that are 
> identifiers but stop on characters that definitely cannot be. There is a 
> subset that reduces the error margins considerably. Otherwise you throw 
> lexical errors and bunches of unrelated errors.
>

I approached the problem as you suggested - using semantic predicates.
I'll have yet to test how it behaves when malformed input is read, but
I think this change made the parser more efficient. I transformed
IDENTIFIER rule to:

IDENTIFIER
  :
  {Character.isJavaIdentifierStart(input.LA(1))}?=> . (
{Character.isJavaIdentifierPart(input.LA(1))}?=> . )*
  ;


-- 
Greetings
Marcin Rzeźnicki

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--

You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.


Reply via email to