Eduard Ralph wrote:
> Hi community,
> 
> I'm fighting with the processing of pre-processing instructions according to 
> C# specs. The BNF is:
> 
> Whitespace(opt) '#' Whitespace(opt) 'error' input-characters
> Whitespace(opt) '#' Whitespace(opt) 'warning' input-characters
> Whitespace(opt) '#' Whitespace(opt) 'line'  ...
> 
> where
>  Whitespace(opt) can be optionally one or more spaces ('\u0020','\u00A0', and 
> a few more)
>  Input-characters is anything except newline ('\n', and a few more)
> 
> I wrote in the Lexer, where the other rules are fragments
> 
> 
> PP_DIAGNOSTIC      :        (WHITESPACE* HASH WHITESPACE* 
> 'error')=>WHITESPACE* HASH WHITESPACE* ERROR INPUT_CHARACTER*
>                             |        (WHITESPACE* HASH WHITESPACE* 
> 'warning')=>WHITESPACE* HASH WHITESPACE* WARNING INPUT_CHARACTER*
>                             ;

These probably need NEWLINEs at the end.

> PP_LINE                 :        (WHITESPACE* HASH WHITESPACE* 'line')=> 
> WHITESPACE* HASH WHITESPACE* LINE PP_LINE_INDICATOR NEWLINE
>                             ;

This will not skip whitespace between LINE and PP_LINE_INDICATOR or
between PP_LINE_INDICATOR and NEWLINE.

I think you probably want
  ... => WHITESPACE* HASH WHITESPACE* LINE WHITESPACE* PP_LINE_INDICATOR
           WHITESPACE* NEWLINE

but that is likely independent of your problem with the lexer not
recognising which rule applies.

> fragment PP_LINE_INDICATOR      :        INTEGER_LITERAL PP_FILE_NAME?
>                                                |        IDENTIFIER_OR_KEYWORD
>                                                ;
> 
> fragment PP_FILE_NAME              :        STRING_LITERAL
>                                                ;
> 
> fragment HASH                          :        '#';

I would suggest left-factoring and using actions to change the token type:

  fragment PP_DIAGNOSTIC : ;
  fragment PP_LINE : ;

  PP_UNRECOGNIZED
    : WHITESPACE* HASH WHITESPACE*
      ( (ERROR | WARNING)=> INPUT_CHARACTER* { $type = PP_DIAGNOSTIC; }
      | (LINE)=> LINE WHITESPACE* PP_LINE_INDICATOR WHITESPACE*
                                             { $type = PP_LINE; }
      | INPUT_CHARACTER* // leave as type PP_UNRECOGNIZED [1]
      )? NEWLINE
    ;


[1] omit this line if you want an unrecognized instruction to be a lexer
    mismatch, but I would suggest leaving it for better error recovery.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to