The usual way is to write a pre-processor that just sends the processed source out to the parser with file and line number stamps that you store in a table and cross reference with the tokens. Less complex pre-processors such as C# are done within the lexer. Look at the way the C pre-processor works for an example. When the pre-processor gets complicated, then it is probably better as a separate phase in the tool chain unless there is a sever performance penalty.
Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Phil Ratzloff > Sent: Wednesday, April 06, 2011 6:33 AM > To: [email protected] > Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in > the flow? > > This seems like a useful feature to have. Is it reasonable to consider > making this easier in antlr4? > > > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of A Z > Sent: Tuesday, April 05, 2011 2:16 AM > To: Martin d'Anjou > Cc: [email protected] > Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in > the flow? > > I tried that approach when I first started with ANTLR but had > difficulty handling arbitrary token rearrangement. Early on I couldn't > figure out how to backtrack in the token stream in order to detect > identifier construction using macros. Something like the following > requires that 'prefix' be lexed again after macro substitution in order > to detect if the string from suffix and 'prefix' will be merged into > one identifier. > > define suffix(name) name > prefix`suffix > > We use this often in RTL for bus port lists. Even though the spec seems > to explicitly disallow this, Modelsim and DC will accept it. Lexing > twice solves this case easily but now the tokens point to a non- > existent source. > > > On Mon, Apr 4, 2011 at 8:59 PM, Martin d'Anjou <[email protected]> > wrote: > > > Hi, > > > > Thanks to both of you for sharing your approaches. Right now I am > > pondering how to alter the sequence of tokens before they hit the > > parser. Intuitively I want to have three processing units (lexer, > > pre-processor, parser) connected together through io pipes of tokens > > (e.g. token fifos), but this is not how ANTLR was architected (it's > > how I would have done it in hardware though!). > > > > Martin > > > > > > > > On 11-04-04 09:25 AM, Sam Harwell wrote: > > > >> I used a hand-crafted implementation of TokenSource between the > lexer > >> and parser. In the preprocessor, whenever I manipulated a token I > >> used a new token class derived from CommonToken (call it > >> SubstitutedToken) which contained a linked list leading from the > >> effective position in the stream (stored in CommonToken) all the way > >> back to the original location (file and > >> position) of the token definition. When a CommonToken substitution > >> occurs, the linked list has one node containing the original source > >> position where defined. Whenever a SubstitutedToken substitution > >> occurs, a new node for the token's previous effective position is > >> added to the linked list and that new head pointer is stored in the > >> new token. > >> > >> `define x 3 > >> `define y `x > >> `y > >> > >> In this case, token `y is eventually replaced with a > SubstitutedToken > >> which appears at (line 2, column 1, length 1, text "3") containing > >> the following linked list: > >> > >> Line 3, column 1, length 2 (list head, the location where `y was > >> substituted with `x) Line 2, column 11, length 2 (the location where > >> `x was substituted with > >> '3') > >> Line 1, column 11, length 1 (the actual source location where the > >> token '3' > >> is defined) > >> > >> This list allows true relative ordering of all tokens in the > >> processed > >> source: when two tokens appear to be at the same location in the > >> preprocessed stream, you simply compare the positions of the first > >> node in the position list. > >> > >> Sam > >> > >> -----Original Message----- > >> From: [email protected] > >> [mailto:[email protected]] On Behalf Of A Z > >> Sent: Monday, April 04, 2011 12:13 AM > >> To: Martin d'Anjou > >> Cc: [email protected] > >> Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor > in > >> the flow? > >> > >> Hi Martin, > >> > >> I just completed an SV preprocessor which can parse UVM 1.0 > >> successfully. > >> After 2 revisions I settled on a completely separate > >> preprocessor(lexer and parser). As you saw, you need to tokenize the > >> macro_text in order to easily support macros with arguments and > >> detect the three escaped tokens `", `\`" > >> and ``. I'm not sure how well a lexer only approach could handle > >> cases where a macro substitution can merge text with a previously > >> lexed token. The separate approach still has flaws, such as good > >> error reporting. Of course I could be missing an obvious easy > >> solution. > >> > >> > >> > >> On Sun, Apr 3, 2011 at 9:51 PM, Martin d'Anjou<[email protected]> > wrote: > >> > >> Hello, > >>> > >>> I am trying to find a way to incorporate a preprocessor in the > ANTLR > >>> flow. I thought of doing this before the lexer, but I need to > >>> tokenize the incoming char stream for macro substitution to be > easy. > >>> I thought of doing it between the lexer and the parser, and replace > >>> the preprocessor tokens with their expansion before feeding the > >>> token stream to the parser, so I guess I would end up using > >>> something like the TokenRewriteStream??? Can someone steer me in > the > >>> right direction please? Or should I be using lexer rule actions? In > >>> which case, any example on how to access the token stream of the > >>> replacement token list of an identifier? Too many questions sorry. > >>> > >>> The language I am hoping to tokenize is SystemVerilog and has C- > like > >>> preprocessor macros (`include, `ifdef, `define NAME(params,...), > >>> token concatenation, etc.). > >>> > >>> Regards, > >>> Martin > >>> > >>> > >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >>> Unsubscribe: > >>> http://www.antlr.org/mailman/options/antlr-interest/your-email- > addre > >>> ss > >>> > >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >> Unsubscribe: > >> http://www.antlr.org/mailman/options/antlr-interest/your-email- > addres > >> s > >> > >> > >> > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
