On Thu, Dec 10, 2009 at 8:59 AM, Jim Idle <[email protected]> wrote:
> No - this is the wrong technique as what happens is that the lexer is simpler
> but still rejects malformed identifiers in the wrong way. You have to look
> for a valid start character, then consume until something MUST be something
> other than an identifier character. What you are looking to do is interpolate
> an indentifier that has invalid characters, then issue "Identifiers cannot
> contain character 'xxxx'" etc. The trick is to not match characters that are
> identifiers but stop on characters that definitely cannot be. There is a
> subset that reduces the error margins considerably. Otherwise you throw
> lexical errors and bunches of unrelated errors.
>
I approached the problem as you suggested - using semantic predicates.
I'll have yet to test how it behaves when malformed input is read, but
I think this change made the parser more efficient. I transformed
IDENTIFIER rule to:
IDENTIFIER
:
{Character.isJavaIdentifierStart(input.LA(1))}?=> . (
{Character.isJavaIdentifierPart(input.LA(1))}?=> . )*
;
--
Greetings
Marcin Rzeźnicki
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en.