I used a hand-crafted implementation of TokenSource between the lexer and parser. In the preprocessor, whenever I manipulated a token I used a new token class derived from CommonToken (call it SubstitutedToken) which contained a linked list leading from the effective position in the stream (stored in CommonToken) all the way back to the original location (file and position) of the token definition. When a CommonToken substitution occurs, the linked list has one node containing the original source position where defined. Whenever a SubstitutedToken substitution occurs, a new node for the token's previous effective position is added to the linked list and that new head pointer is stored in the new token.
`define x 3 `define y `x `y In this case, token `y is eventually replaced with a SubstitutedToken which appears at (line 2, column 1, length 1, text "3") containing the following linked list: Line 3, column 1, length 2 (list head, the location where `y was substituted with `x) Line 2, column 11, length 2 (the location where `x was substituted with '3') Line 1, column 11, length 1 (the actual source location where the token '3' is defined) This list allows true relative ordering of all tokens in the processed source: when two tokens appear to be at the same location in the preprocessed stream, you simply compare the positions of the first node in the position list. Sam -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of A Z Sent: Monday, April 04, 2011 12:13 AM To: Martin d'Anjou Cc: [email protected] Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in the flow? Hi Martin, I just completed an SV preprocessor which can parse UVM 1.0 successfully. After 2 revisions I settled on a completely separate preprocessor(lexer and parser). As you saw, you need to tokenize the macro_text in order to easily support macros with arguments and detect the three escaped tokens `", `\`" and ``. I'm not sure how well a lexer only approach could handle cases where a macro substitution can merge text with a previously lexed token. The separate approach still has flaws, such as good error reporting. Of course I could be missing an obvious easy solution. On Sun, Apr 3, 2011 at 9:51 PM, Martin d'Anjou <[email protected]> wrote: > Hello, > > I am trying to find a way to incorporate a preprocessor in the ANTLR > flow. I thought of doing this before the lexer, but I need to tokenize > the incoming char stream for macro substitution to be easy. I thought > of doing it between the lexer and the parser, and replace the > preprocessor tokens with their expansion before feeding the token > stream to the parser, so I guess I would end up using something like > the TokenRewriteStream??? Can someone steer me in the right direction > please? Or should I be using lexer rule actions? In which case, any > example on how to access the token stream of the replacement token > list of an identifier? Too many questions sorry. > > The language I am hoping to tokenize is SystemVerilog and has C-like > preprocessor macros (`include, `ifdef, `define NAME(params,...), token > concatenation, etc.). > > Regards, > Martin > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
