Parsing comments again (cpp linemarkers)
Hi, I'm almost finished with the script (the stripped down test case is on the bottom of this mail) which parses cpp-preprocessed output. The pecularity of parsing cpp-output is that if the cpp-processed file #includes a file, then cpp prints a linemarker line, like: # 1 ..\\mmstorageiface\\group\\bld.inf 1 In my script I keep track of the current working directory and update it (in the updateCurdir() below) whenever a linemarker rule (with flags 1 or 2) triggers. The script works fine for the most of my input files, but there are few that break it. There are 5 kind of sections started by the section terminal. And the problem comes when the input file #includes another file, while being in the middle of one section, like here: PRJ_MMPFILES MMSTORAGEIFACE.MMP ..\GROUP\PNMSG.MMP # 72 bld.inf 2 makefile ..\group\convcolor.mk The section above is started by the PRJ_MMPFILES terminal, then 2 paths to .mmp files are coming and then (unexpected) the cpp-linemarker comes before the last path. What could I do now to solve this problem, please? I can't just skip the linemarker comments as in P::RD::FAQ. I also don't want to prepend the linemarker subrule to each of the 5 section rules, like here: prj_mmpfile: /PRJ_MMPFILES\b/i (linemarker | mmpfile[\%::prj_mmpfiles])(s?) because that would make my script really unreadable and because I'll have to add an if-statement into each of the 5 actions in order to distinguish if it was triggered by another linemarker or by the normal section entry. Any advices please? Regards Alex #!/usr/bin/perl use strict; use vars qw($parser $text %prj_platforms %prj_exports %prj_testexports %prj_mmpfiles %prj_testmmpfiles); use Parse::RecDescent; # platforms to use when 'DEFAULT' has been specified use constant DEFPLATS = qw(ARMI ARM4 THUMB WINS WINSCW); #$RD_WARN = 1; #$RD_HINT = 1; #$RD_TRACE = 120; use constant GRAMMAR = q( inffile: chunk(s) /^\Z/ chunk: linemarker | prj_platform | prj_export | prj_testexport | prj_mmpfile | prj_testmmpfile | error # cpp linemarker, as described in cpp manual Preprocessor output linemarker: '#' skip: '[ \t]+' linenum path flag(s?) { ::updateCurdir($item{path}, $item[-1]); } linenum: /\d+/ flag: '1' | '2' | '3' | '4' prj_platform: /PRJ_PLATFORMS\b/i platform(s?) { # go through all specified platforms for my $p (@{$item[-1]}) { if ($p =~ /^DEFAULT$/i) { # store the names of 5 default platforms @::prj_platforms{::DEFPLATS} = (1) x 5; } else { $::prj_platforms{uc $p} = 1; } } } # TODO: make these terminals ignore filenames like TOOLS.mk platform: /ARM4\b/i | /ARMI\b/i | /DEFAULT\b/i | /MCOT\b/i | /MCOY\b/i | /MEIG\b/i | /MHELEN\b/i | /MINT\b/i | /MISA\b/i | /MLNK\b/i | /MTEMPLATE\b/i | /MWD2\b/i | /THUMB\b/i | # don't match TOOLS.mk /TOOLS(?=(?:\s|\Z))/i | /WINC\b/i | /WINSCW\b/i | /WINS\b/i # the rules prj_export and prj_testexport below are similar, so they # just pass a reference to the corresponding hash down to the export rule prj_export: /PRJ_EXPORTS\b/i export[\%::prj_exports](s?) prj_testexport: /PRJ_TESTEXPORTS\b/i export[\%::prj_testexports](s?) # export statements are terminated by newlines, so don't skip newlines export: path skip: '[ \t]+' path(?) { ::storeExpPath($arg[0], $item[1], $item[-1]-[0]); } # the rules prj_mmpfile and prj_testmmpfile below are similar, so they # just pass a reference to the corresponding hash down to the mmpfile rule prj_mmpfile: /PRJ_MMPFILES\b/i mmpfile[\%::prj_mmpfiles](s?) prj_testmmpfile: /PRJ_TESTMMPFILES\b/i mmpfile[\%::prj_testmmpfiles](s?) mmpfile: makefile(?) path special(?) { ::storeMmpPath($arg[0], $item[1]-[0], $item{path}, $item[-1]-[0]); } path: /([^]*)/ { ::unixifyPath($1); } | ...!section ...!platform ...!linemarker /\S+/ { ::unixifyPath($item[-1]); } section: /PRJ_EXPORTS\b/i | /PRJ_MMPFILES\b/i | /PRJ_PLATFORMS\b/i | /PRJ_TESTEXPORTS\b/i | /PRJ_TESTMMPFILES\b/i makefile: /GNUMAKEFILE\b/i | /MAKEFILE\b/i | /NMAKEFILE\b/i special: /TIDY\b/i | /IGNORE\b/i | /MANUAL\b/i | /SUPPORT\b/i ); # the actual functions are larger, but I've skipped them here sub unixifyPath($) { return 1; } sub updateCurdir($$) { return 1; } sub storeExpPath($$$) { return 1; } sub storeMmpPath() { return 1; } $parser = Parse::RecDescent-new(GRAMMAR) or die 'Bad grammar'; $text .= $_ while (DATA); defined $parser-inffile($text) or die 'Bad text'; __DATA__ PRJ_PLATFORMS DEFAULT WINC PRJ_EXPORTS
Re: Parsing comments again (cpp linemarkers)
I do not have an easy answer for you, perhaps one of the Wizards will. On Thursday, Jun 3, 2004 [EMAIL PROTECTED] said: Hi, I'm almost finished with the script (the stripped down test case is on the bottom of this mail) which parses cpp-preprocessed output. The pecularity of parsing cpp-output is that if the cpp-processed file #includes a file, then cpp prints a linemarker line, like: # 1 ..\\mmstorageiface\\group\\bld.inf 1 So, in other words, this can appear at the top-most parsing level. In my script I keep track of the current working directory and update it (in the updateCurdir() below) whenever a linemarker rule (with flags 1 or 2) triggers. The script works fine for the most of my input files, but there are few that break it. There are 5 kind of sections started by the section terminal. And the problem comes when the input file #includes another file, while being in the middle of one section, like here: PRJ_MMPFILES MMSTORAGEIFACE.MMP ..\GROUP\PNMSG.MMP # 72 bld.inf 2 makefile ..\group\convcolor.mk So in other words, this can *also* appear INSIDE a subordinate parsing level. H... The section above is started by the PRJ_MMPFILES terminal, then 2 paths to .mmp files are coming and then (unexpected) the cpp-linemarker comes before the last path. What could I do now to solve this problem, please? Seems to me like there is no obvious solution except to put it in two places :-). I can't just skip the linemarker comments as in P::RD::FAQ. I also don't want to prepend the linemarker subrule to each of the 5 section rules, like here: prj_mmpfile: /PRJ_MMPFILES\b/i (linemarker | mmpfile[\%::prj_mmpfiles])(s?) OK, so then put the rule in mmpfile, it'll catch all five cases... mmpfile: makefile(?) path special(?) { ::storeMmpPath($arg[0], $item[1]-[0], $item{path}, $item[-1]-[0]); } | linemarker because that would make my script really unreadable and Unreadability is in the eye of the beholder. I like using white space and alignment to make things readable, but that is just MHO. because I'll have to add an if-statement into each of the 5 actions in order to distinguish if it was triggered by another linemarker or by the normal section entry. Yep, sounds like you will, because the semantics subtly change between the two cases and this sounds like a pathologic thing to me based on the semantics of the input data. Any advices please? Since I'm talking and I can't shut up, I have to offer my opinion about your coding style wrt(1) side effects. Inside of parsing rules, you update your internal tables early (i.e. the chdir rule triggers an update as soon as its encountered). This can be fine or it can cause unfortunate side effects which become parse dependent, particularly if you use negative look-ahead. There is *bound* to be one single corner case in some friggin' windoze file somewhere that will cause you to spend hours in debug before you end up slapping your forehead and shouting Doh!. I would advise that all side effects be implemented at the highest possible level i.e. when you are Real Sure that you have the right data. At that point it is safe to update the tables. You are not making your own data structure, which is OK, but it can make it somewhat complicated when you go back to make some kind of data dependent update. I would recommend that all side effects be implemented in the innfile rule because it is the only place that you are Real Sure of what you have. Also another coding trick that makes it easy to maintain code is to have keyword rules programmable: use constant GRAMMAR = q( {my @makefile_keywords=qw(GNUMAKEFILE MAKEFILE NMAKEFILE); ... makefile: /\w+\b/ { if (grep(lc($item[1]) eq $_,@makefile_keywords)) { $item[1] } else { undef } } ... This way when you discover that you have to add Just One More Keyword, you just put it in the appropriate list. (1) wrt: With Respect To - at Intel we love TLAs - Three Letter Acronyms Regards Alex -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226