I tried to devise a C preprocessor, but then I figured out that I could write something like that: --------------------------- #define A(arg) A_start (arg) A_end
#define A_start "this is A_start definition." #define A_end "this is A_end definition." A ( #undef A_start #define A_start A_end ) --------------------------- gcc preprocesses it into the following: --------------------------- # 1 "a.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "a.c" "this is A_end definition." () "this is A_end definition." --------------------------- Another woes are filenames in angle brackets for #include. They require special case for tokenizer. So I given it (fully compliant C preprocessor) up. ;) Other than that, C preprocessor looks simple. I hardly qualify as a student, though. 2010/3/30 Aaron Tomb <[email protected]>: > The first is to integrate preprocessing into the library. Currently, the > library calls out to GCC to preprocess source files before parsing them. > This has some unfortunate consequences, however, because comments and macro > information are lost. A number of program analyses could benefit from > metadata encoded in comments, because C doesn't have any sort of formal > annotation mechanism, but in the current state we have to resort to ugly > hacks (at best) to get at the contents of comments. Also, effective > diagnostic messages need to be closely tied to original source code. In the > presence of pre-processed macros, column number information is unreliable, > so it can be difficult to describe to a user exactly what portion of a > program a particular analysis refers to. An integrated preprocessor could > retain comments and remember information about macros, eliminating both of > these problems. > > The second possible project is to create a nicer interface for traversals > over Language.C ASTs. Currently, the symbol table is built to include only > information about global declarations and those other declarations currently > in scope. Therefore, when performing multiple traversals over an AST, each > traversal must re-analyze all global declarations and the entire AST of the > function of interest. A better solution might be to build a traversal that > creates a single symbol table describing all declarations in a translation > unit (including function- and block-scoped variables), for easy reference > during further traversals. It may also be valuable to have this traversal > produce a slightly-simplified AST in the process. I'm not thinking of > anything as radical as the simplifications performed by something like CIL, > however. It might simply be enough to transform variable references into a > form suitable for easy lookup in a complete symbol table like I've just > described. Other simple transformations such as making all implicit casts > explicit, or normalizing compound initializers, could also be good. > > A third possibility, which would probably depend on the integrated > preprocessor, would be to create an exact pretty-printer. That is, a > pretty-printing function such that pretty . parse is the identity. > Currently, parse . pretty should be the identity, but it's not true the > other way around. An exact pretty-printer would be very useful in creating > rich presentations of C source code --- think LXR on steroids. > > If you're interested in any combination of these, or anything similar, let > me know. The deadline is approaching quickly, but I'd be happy to work > together with a student to flesh any of these out into a full proposal. > > Thanks, > Aaron > > -- > Aaron Tomb > Galois, Inc. (http://www.galois.com) > [email protected] > Phone: (503) 808-7206 > Fax: (503) 350-0833 > > _______________________________________________ > Haskell-Cafe mailing list > [email protected] > http://www.haskell.org/mailman/listinfo/haskell-cafe > _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
