On Tue, Mar 30, 2010 at 5:14 PM, Aaron Tomb <at...@galois.com> wrote: > That's very good to hear! > > When it comes to preprocessing and exact printing, I think that there are > various stages of completeness that we could support. > > 1) Add support for parsing comments to the Language.C parser. Keep using an > external pre-processor but tell it to leave comments in the source code. The > cpphs pre-processor can do this. The trickiest bit here would have to do > with where to record the comments in the AST. What AST node is a given > comment associate with? We could probably come up with some general rules, > and perhaps certain comments, in weird locations, would still be ignored.
> > 2) Support correct column numbers for source locations. This falls short of > complete macro support, but covers one of the key problems that macros > introduce. The mcpp preprocessor [1] has a special diagnostic mode where it > adds special comments describing the origin of code that resulted from macro > expansion. If the parser retained comments, we could use this information to > help with exact pretty-printing. > > 3) Modify the pretty-printer to take position information into account when > pretty-printing (at least optionally). As long as macro definitions > themselves (as well as #ifdef, etc.) are not in the AST, the output will > still not be exactly the same as the input, but it'll come closer. > > 4) Add full support for parsing and expanding macros internally, so that > both macro definitions and expansions appear in the Language.C AST. This is > probably a huge project, partly because macros do not have to obey the tree > structure of the C language in any way. This is perhaps beyond the scope of > a summer project, but the other steps could help prepare for it in the > future, and still fully address some of the problems caused by the > preprocessor along the way. I haven't looked at the C spec on macros, but I'm pretty motivated and would like to shoot for a big project. > > Do you think you'd be interested in some subset or variation of 1, 2, and 3? > Are there other ideas you have? Things I've missed? Things you'd do > differently? I'm very interested in all 3 of them, and actually somewhat in #4, though I'll have to do some reading to understand why you're saying it's such a big undertaking. > > Thanks, > Aaron > > > [1] http://mcpp.sourceforge.net/ > > > On Mar 30, 2010, at 1:46 PM, Edward Amsden wrote: > >> I'd be very much interested in working on this library for GSoC. I'm >> currently working on an idea for another project, but I'm not certain >> how widely beneficial it would be. The preprocessor and >> pretty-printing projects sound especially intriguing. >> >> On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb <at...@galois.com> wrote: >>> >>> Hello, >>> >>> I'm wondering whether there's anyone on the list with an interest in >>> doing >>> additional work on the Language.C library for the Summer of Code. There >>> are >>> a few enhancements that I'd be very interested seeing, and I'd love be a >>> mentor for such a project if there's a student interested in working on >>> them. >>> >>> The first is to integrate preprocessing into the library. Currently, the >>> library calls out to GCC to preprocess source files before parsing them. >>> This has some unfortunate consequences, however, because comments and >>> macro >>> information are lost. A number of program analyses could benefit from >>> metadata encoded in comments, because C doesn't have any sort of formal >>> annotation mechanism, but in the current state we have to resort to ugly >>> hacks (at best) to get at the contents of comments. Also, effective >>> diagnostic messages need to be closely tied to original source code. In >>> the >>> presence of pre-processed macros, column number information is >>> unreliable, >>> so it can be difficult to describe to a user exactly what portion of a >>> program a particular analysis refers to. An integrated preprocessor could >>> retain comments and remember information about macros, eliminating both >>> of >>> these problems. >>> >>> The second possible project is to create a nicer interface for traversals >>> over Language.C ASTs. Currently, the symbol table is built to include >>> only >>> information about global declarations and those other declarations >>> currently >>> in scope. Therefore, when performing multiple traversals over an AST, >>> each >>> traversal must re-analyze all global declarations and the entire AST of >>> the >>> function of interest. A better solution might be to build a traversal >>> that >>> creates a single symbol table describing all declarations in a >>> translation >>> unit (including function- and block-scoped variables), for easy reference >>> during further traversals. It may also be valuable to have this traversal >>> produce a slightly-simplified AST in the process. I'm not thinking of >>> anything as radical as the simplifications performed by something like >>> CIL, >>> however. It might simply be enough to transform variable references into >>> a >>> form suitable for easy lookup in a complete symbol table like I've just >>> described. Other simple transformations such as making all implicit casts >>> explicit, or normalizing compound initializers, could also be good. >>> >>> A third possibility, which would probably depend on the integrated >>> preprocessor, would be to create an exact pretty-printer. That is, a >>> pretty-printing function such that pretty . parse is the identity. >>> Currently, parse . pretty should be the identity, but it's not true the >>> other way around. An exact pretty-printer would be very useful in >>> creating >>> rich presentations of C source code --- think LXR on steroids. >>> >>> If you're interested in any combination of these, or anything similar, >>> let >>> me know. The deadline is approaching quickly, but I'd be happy to work >>> together with a student to flesh any of these out into a full proposal. >>> >>> Thanks, >>> Aaron >>> >>> -- >>> Aaron Tomb >>> Galois, Inc. (http://www.galois.com) >>> at...@galois.com >>> Phone: (503) 808-7206 >>> Fax: (503) 350-0833 >>> >>> _______________________________________________ >>> Haskell-Cafe mailing list >>> Haskell-Cafe@haskell.org >>> http://www.haskell.org/mailman/listinfo/haskell-cafe >>> >> _______________________________________________ >> Haskell-Cafe mailing list >> Haskell-Cafe@haskell.org >> http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe