Re: help: defining grammer?
What you want to do is hard as a generic solution and you will find lots and lots of literature on program translation. That being said, you should break this up into a two step process, step one is to parse the generic language (which looks an *awful* lot like perl... :-) into a data structure which encapsulates a generic view of the constructs. A second step takes that data structure and produces the appropriate output. For example, $var = 10 assignment: lhs = expression lhs: scalar | array | hash scalar: /\$[\w]+/ expression: {whole expression tree} ... atom ... atom: constant ... constant: string_constant | integer integer: /[-+]?\d+/ Once you have the data structure, you can look at it and to produce scheme, the program will see that you are doing an assign of a variable to a constant and emit the Define var1 = 10 string etc. This really is two problems and each one should be though out separately. On Monday, Oct 25, 2004 Prentice, Phillip R said: Hello, =20 I was wondering if anyone could help me begin to write some grammer for a conversion project. I basically have a generic language where variables, arrays, and hash's are defined. I want these variables to be translated to a tool specific datastructure, i.e TCL, scheme, etc. I am assuming I would have to create a different set of grammar(s) for each language I want the variables translated to. However, I am having trouble defining the grammar for parsing the genericLanguange. Could you help me get started by showing me how I would go about parsing the below 3 structures in the GenericLangauge section below. I would appreciate any insight or suggestions. =20 Thanks in advance, -Phillip =20 e.g. for example =20 __GenericLanguage__ =20 $var =3D 10 $var1 =3D variable1 @var2 =3D [1,2,3] %var3 =3D {'key1'=3D'value1', 'key2'=3D'value2} =20 __Translate-to-SchemeLanguage__ =20 Define var1 =3D variable1 Define var2 =3D '(1 2 3) ... =20 __Translate-to-TCLLanguage__ =20 Set var1 variable =20 --_=_NextPart_001_01C4BAE7.C2C3FB0C-- -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: evaluating an expression while parsing
On Thursday, Jul 15, 2004 Ted Zlatanov said: On Wed, 14 Jul 2004, [EMAIL PROTECTED] wrote: I think it is better to create an executable and then execute it. Isn't this a lot harder to debug? It seems like dumping the contents of a data structure is a lot easier than dumping subroutines. With the high rate of P::RD initial grammar bugs, at least in my experience, this is an important consideration. Ted Debugging *is* an issue, but its a small one. I find it helpful to first focus on the parsing itself (i.e. just get the *grammar* right) and then go back and retrofit the actual application. As far as debugging, you can still debug this using the perl debugger just like you debug anything else. The only difference is that it is a bit more challenging to insert a breakpoint. To insert a breakpoint in some bit of buried subroutine, simply insert the following statement: $DB::single=1; and voila, you will break there with all of the access you normally have in the debugger. IMHO, if you are looking for ease of debug, you need to avoid PR::D altogether... ;-) -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: rebuilding a string from the autotree datastructure.
On Tuesday, Jun 15, 2004 David Holden said: Hello, I would like to rebuild a parsed string from the datastructure given by autotree. For example Given the date string (2001). using the following rules:- Date: lft_bracket Year YearLabel(?) rgt_bracket point Year: /\d{4}/ YearLabel: /[a-z]/ point: '.' lft_bracket: '(' rgt_bracket: ')' autotree gives the following datastructure $VAR1 = bless( { 'Year' = bless( { '__VALUE__' = '2001' }, 'Year' ), '__RULE__' = 'Date', 'point' = bless( { '__VALUE__' = '.' }, 'point' ), 'lft_bracket' = bless( { '__VALUE__' = '(' }, 'lft_bracket' ), 'rgt_bracket' = bless( { '__VALUE__' = ')' }, 'rgt_bracket' ), 'YearLabel(?)' = [] }, 'Date' ); I would like to be able to reconstruct the string (2001). from this structure the problem I have is that because it is a HASH it does not have order information, e.g. in the above Year key comes before lft_bracket. Am I missing somethings? No. Basically what you want to do is unparse the data. The unparser will need to know as much about the syntax as the parser. autotree gives you a best guess as to what you need, but if you want to have a structure that contains ordering information, you will need to take control of the structure that gets created. To do so, you do not need to abandon autotree: use strict; use vars qw($parser $text %top); use Data::Dumper; use Parse::RecDescent; $RD_WARN = 1; $RD_HINT = 1; $RD_TRACE = 120; use constant GRAMMAR = q( autotree Date: lft_bracket Year YearLabel(?) rgt_bracket point [EMAIL PROTECTED] Year: /\d{4}/ YearLabel: /[a-z]/ point: '.' lft_bracket: '(' rgt_bracket: ')' ); $parser = Parse::RecDescent-new(GRAMMAR) or die 'Bad grammar'; $text .= $_ while (DATA); print Dumper($parser-Date($text)); __DATA__ (2001). thanks in advance. Dave. -- Dr. David Holden. Thanks in advance:- Please avoid sending me Word or PowerPoint attachments. See: http://www.fsf.org/philosophy/no-word-attachments.html Show me your papers..: http://www.no2id.net/index.html Public GPG key available on request. - -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: Parsing comments again (cpp linemarkers)
I do not have an easy answer for you, perhaps one of the Wizards will. On Thursday, Jun 3, 2004 [EMAIL PROTECTED] said: Hi, I'm almost finished with the script (the stripped down test case is on the bottom of this mail) which parses cpp-preprocessed output. The pecularity of parsing cpp-output is that if the cpp-processed file #includes a file, then cpp prints a linemarker line, like: # 1 ..\\mmstorageiface\\group\\bld.inf 1 So, in other words, this can appear at the top-most parsing level. In my script I keep track of the current working directory and update it (in the updateCurdir() below) whenever a linemarker rule (with flags 1 or 2) triggers. The script works fine for the most of my input files, but there are few that break it. There are 5 kind of sections started by the section terminal. And the problem comes when the input file #includes another file, while being in the middle of one section, like here: PRJ_MMPFILES MMSTORAGEIFACE.MMP ..\GROUP\PNMSG.MMP # 72 bld.inf 2 makefile ..\group\convcolor.mk So in other words, this can *also* appear INSIDE a subordinate parsing level. H... The section above is started by the PRJ_MMPFILES terminal, then 2 paths to .mmp files are coming and then (unexpected) the cpp-linemarker comes before the last path. What could I do now to solve this problem, please? Seems to me like there is no obvious solution except to put it in two places :-). I can't just skip the linemarker comments as in P::RD::FAQ. I also don't want to prepend the linemarker subrule to each of the 5 section rules, like here: prj_mmpfile: /PRJ_MMPFILES\b/i (linemarker | mmpfile[\%::prj_mmpfiles])(s?) OK, so then put the rule in mmpfile, it'll catch all five cases... mmpfile: makefile(?) path special(?) { ::storeMmpPath($arg[0], $item[1]-[0], $item{path}, $item[-1]-[0]); } | linemarker because that would make my script really unreadable and Unreadability is in the eye of the beholder. I like using white space and alignment to make things readable, but that is just MHO. because I'll have to add an if-statement into each of the 5 actions in order to distinguish if it was triggered by another linemarker or by the normal section entry. Yep, sounds like you will, because the semantics subtly change between the two cases and this sounds like a pathologic thing to me based on the semantics of the input data. Any advices please? Since I'm talking and I can't shut up, I have to offer my opinion about your coding style wrt(1) side effects. Inside of parsing rules, you update your internal tables early (i.e. the chdir rule triggers an update as soon as its encountered). This can be fine or it can cause unfortunate side effects which become parse dependent, particularly if you use negative look-ahead. There is *bound* to be one single corner case in some friggin' windoze file somewhere that will cause you to spend hours in debug before you end up slapping your forehead and shouting Doh!. I would advise that all side effects be implemented at the highest possible level i.e. when you are Real Sure that you have the right data. At that point it is safe to update the tables. You are not making your own data structure, which is OK, but it can make it somewhat complicated when you go back to make some kind of data dependent update. I would recommend that all side effects be implemented in the innfile rule because it is the only place that you are Real Sure of what you have. Also another coding trick that makes it easy to maintain code is to have keyword rules programmable: use constant GRAMMAR = q( {my @makefile_keywords=qw(GNUMAKEFILE MAKEFILE NMAKEFILE); ... makefile: /\w+\b/ { if (grep(lc($item[1]) eq $_,@makefile_keywords)) { $item[1] } else { undef } } ... This way when you discover that you have to add Just One More Keyword, you just put it in the appropriate list. (1) wrt: With Respect To - at Intel we love TLAs - Three Letter Acronyms Regards Alex -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: grammar problems
Its massively helpful if you provide a complete (and working! - i.e. compiles) example. I would encourage you to go back to the PR::D POD and follow the examples for this kind of thing VERY CLOSELY. The (a) problem is that ident does not disallow keywords: use strict; use vars qw($parser $text %top); use Data::Dumper; use Parse::RecDescent; $RD_WARN = 1; $RD_HINT = 1; $RD_TRACE = 120; my $grammar = q { autotree disj : conj disjOp disj | conj conj : term conjOp(?) conj | term term : qualif(?) term2 term2 : brack | phrase | ident brack : '(' disj ')' phrase: '' ident(s?) '' ident : /[a-zA-Z\d]+/ {$return=($item[1]=~/^(or|and)$/i)?undef:$item[1]} qualif: ident '=' conjOp: /AND/i #| '' disjOp: /OR/i }; $parser = Parse::RecDescent-new($grammar); print Dumper($parser-disj('a or b')); This turns out to be critical because of your requirement that the keyword operator and be optional. This works by failing the rule in the exception case. Another way to do it is to use the lookahead feature. On Friday, May 21, 2004 Jonas Wolf said: Following your recommendation, I have altered the query to read as follows: my $grammar = q { autotree disj : conj disjOp disj | conj conj : term conjOp(?) conj | term term : qualif(?) term2 term2 : brack | phrase | ident brack : '(' disj ')' phrase: '' ident(s?) '' ident : /[a-zA-Z0-9]+/i qualif: ident '=' conjOp: /AND/i #| '' disjOp: /OR/i }; This should fix the precedence issue with 'qualif'. However, this shows the same misbehaviour as when I tried it. Again, a query 'a or b' is interpreted as 'a AND or AND b'. Any ideas? Jonas Jonas Wolf/UK/[EMAIL PROTECTED] 21-05-04 09:16 To: Ron D. Smith [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject:Re: grammar problems I would like to parse boolean queries like the following: = means the two queries represent the same structure this and that = this that this or that a b or c d = (a b) or (c d) = (a and b) or (c and d) this phrase word = this phrase and word abstract = error or content = mistake = (abstract = error) or (content = mistake) ... I hope that clarifies things. conj : term conjOp(?) conj | term Yes, that seems like a good alternative, I will have to try if it works. Thanks, Jonas Hello, I recently downloaded the Parse::RecDescent package to parse boolean queries. I have several questions, not necessarily about the package itself, but rather about my grammar. I am currently using the following grammar: my $grammar = q { autotree disj : qualif(?) conj disjRec(s?) disjRec : disjOp conj conj : term conjRec(s?) conjRec : conjOp term term : brack | phrase | ident brack : '(' disj ')' phrase: '' ident(s?) '' ident : /[a-zA-Z0-9]+/i qualif: ident '=' conjOp: /AND/i disjOp: /OR/i }; I have several problems with this. Firstly, the precedence isn't quite what I want in 'qualif'. I would like 'ident' to bind tightly to whatever comes next to it. Currently it seems to associate it with the whole 'disj' that comes after. You do not provide any examples of what you are trying to parse, which would help. IMHO what you mean by the above paragraph is not clear so I don't know what you want. Secondly, I would like the 'conjOp' operator to be optional, and that the parser recognises this. This means a query for 'a b' would be interpreted as 'a AND b'. I tried replacing the conjOp rule with 'conjOp : /AND/i | ', but this does not work, as now a query 'a AND b' is interpreted as 'a AND and ...'. Perhaps: conj : term conjOp(?) conj | term It might help if you followed the example given in the PR::D POD more closely. Any help would be appreciated. Thanks, Jonas. -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: option: /option/i /\w+/ optarg(?) ''
On Thursday, May 20, 2004 Alexander Farber said: Hi, why doesn't the question mark in the optarg(?) below work as I expect? In the trace I see that optarg consumes the character and then the option rule fails, since the final is missing. Isn't P:RD supposed to backtrack and say ok, the optarg(?) didn't match anything here? Um, why would it do this? In fact, optarg *does* match zero or one times, it is option that does not match. This is a simple problem with your grammar where you got (exactly...) what you asked for. As an example, if you change optarg it will be better: optarg: /[^\s]+/ Also, there is more than one way to do it but I don't like your first rule for option /option/ but would prefer '' 'option' as it is easier to maintain. If the problem is that there should not be a space between '' and 'option' than change skip. But this is not related to your question, this is just me butting into your style... ;-) Of course the script below works when I change the optarg from /\S+/ to /\w+/, but I'm curious, why doesn't optarg(?) mean ZERO or one here? Thank you Alex #!/usr/bin/perl -w use strict; use vars qw($parser $text %option); use Data::Dumper; use Parse::RecDescent; $RD_WARN = 1; $RD_HINT = 1; $RD_TRACE = 120; $parser = Parse::RecDescent-new(q( genfile: chunk(s) /^\Z/ chunk: option | error option: /option/i /\w+/ optarg(?) { push @{$::option{lc $item[2]}}, $item[-2]; } optarg: /\S+/ )) or die 'Bad grammar'; $text .= $_ while (DATA); defined $parser-genfile($text) or die 'Bad text'; print STDERR Data::Dumper-Dump([\%option], [qw(option)]); __DATA__ option one option two -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: grammar problems
On Thursday, May 20, 2004 Jonas Wolf said: Hello, I recently downloaded the Parse::RecDescent package to parse boolean queries. I have several questions, not necessarily about the package itself, but rather about my grammar. I am currently using the following grammar: my $grammar = q { autotree disj : qualif(?) conj disjRec(s?) disjRec : disjOp conj conj : term conjRec(s?) conjRec : conjOp term term : brack | phrase | ident brack : '(' disj ')' phrase: '' ident(s?) '' ident : /[a-zA-Z0-9]+/i qualif: ident '=' conjOp: /AND/i disjOp: /OR/i }; I have several problems with this. Firstly, the precedence isn't quite what I want in 'qualif'. I would like 'ident' to bind tightly to whatever comes next to it. Currently it seems to associate it with the whole 'disj' that comes after. You do not provide any examples of what you are trying to parse, which would help. IMHO what you mean by the above paragraph is not clear so I don't know what you want. Secondly, I would like the 'conjOp' operator to be optional, and that the parser recognises this. This means a query for 'a b' would be interpreted as 'a AND b'. I tried replacing the conjOp rule with 'conjOp : /AND/i | ', but this does not work, as now a query 'a AND b' is interpreted as 'a AND and ...'. Perhaps: conj : term conjOp(?) conj | term It might help if you followed the example given in the PR::D POD more closely. Any help would be appreciated. Thanks, Jonas. -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: Parsing binary AND/OR expressions, with brackets
1) you needed to fix the syntax error. (It was the 'i' modifier on the (s//i) 2) you did not define nested parentheses as the highest priority and allow for any number of them 3) the skip was *demanding* to be there ('+' instead of '*') The following produces output but I'm still not sure what you are doing... use strict; use vars qw($parser $text %top); use Data::Dumper; use Parse::RecDescent; $RD_WARN = 1; $RD_HINT = 1; $RD_TRACE = 120; $parser = Parse::RecDescent-new(q( mmpfile: chunk(s) /^\Z/ chunk: assignment | ifdef|error assignment: keyword skip: '[ \t]*' value(s) { push @{$::top{uc $item{keyword}}}, @{$item[-1]}; } ifdef: /#\s*if/ skip: '[ \t]*' conjunction skip: $item[2] chunk(s?) /#\s*endif/ conjunction: disjunction(s //) disjunction: unary_expr(s /\|\|/) unary_expr: '!' defined_expr | defined_expr defined_expr: 'defined' '(' value ')' | 'defined' value | '(' conjunction ')' value: /VAL\d+\b/i keyword: /KEY\d+\b/i )) or die 'Bad grammar'; $text .= $_ while (DATA); defined $parser-mmpfile($text) or die 'Bad text'; print Data::Dumper-Dump([\%top], [qw(top)]); __DATA__ #if defined val1 key1val2 #endif #if ! ( (defined ( val3 ) ) || ( defined (val4) ) ) key2val5 #endif On Monday, May 17, 2004 Alexander Farber said: Hi, could someone please help me few steps further? I'm almost finished with the parser I need. The only missing part is grokking C-preprocessor-like #if defined / #endif expressions, which can be surrounded by brackets and contain and || as binary operators. Unfortunately my script fails already with some syntax error, which I couldn't fix yet: Parse::RecDescent: Treating conjunction: as a rule declaration Warning: Undefined (sub)rule keyword used in a production. (Hint: Will you be providing this rule later, or did you perhaps misspell keyword? Otherwise it will be treated as an immediate reject.) Warning: Undefined (sub)rule value used in a production. (Hint: Will you be providing this rule later, or did you perhaps misspell value? Otherwise it will be treated as an immediate reject.) Also, please any suggestions on how to handle the nested brackets properly? I keep looking at the demo_operator.pl and other P::RD examples but couldn't figure it out yet... What I also don't know yet, is in which datastructure to save the binary operators and conditions that I'm trying to parse. I'm trying to convert the format below to the GNU make's ifdef/ifndef Regards Alex #!/usr/bin/perl -w use strict; use vars qw($parser $text %top); use Data::Dumper; use Parse::RecDescent; $RD_WARN = 1; $RD_HINT = 1; $RD_TRACE = 120; $parser = Parse::RecDescent-new(q( mmpfile: chunk(s) /^\Z/ chunk: assignment | error assignment: keyword skip: '[ \t]+' value(s) { push @{$::top{uc $item{keyword}}}, @{$item[-1]}; } ifdef: /#\s*if/ skip: '[ \t]+' condition skip: $item[2] chunk(s?) /#\s*endif/ condition: '(' conjunction ')' | conjunction conjunction: disjunction(s //i) disjunction: unary_expr(s /\|\|/i) unary_expr: '!' defined_expr | defined_expr defined_expr: 'defined' '(' value ')' | 'defined' value value:/VAL\d+\b/i keyword: /KEY\d+\b/i )) or die 'Bad grammar'; $text .= $_ while (DATA); defined $parser-mmpfile($text) or die 'Bad text'; print Data::Dumper-Dump([\%top], [qw(top)]); __DATA__ #if defined val1 key1 val2 #endif #if ! ( (defined ( val3 ) ) || ( defined (val4) ) ) key2 val5 #endif -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: keyword value(s) newline
On Thursday, May 13, 2004 [EMAIL PROTECTED] said: 1) you did not read the section on skip very carefully. The Skipping between terminals section in the man doesn't tell explicitly that skip: /regex/ is not supported. There is an example skip: qr/[:,]/ and I had been misleaded by that. Um, yeah, well, this is a simple issue of perl syntax. The '/' in the above is not a regular expression delimiter but a quote delimiter. 2) you do not have balanced delimiters in your parse description Could you please tell a bit more, what do you mean here? Simple. The original supplied script had this: push @::value, join ' ', @{$item{'value'}; Notice that this is simply missing a trailing '}' before the ';'. While this seems a simple mistake, I have noticed that PR::D does not tolerate perl syntax errors well and will supply misleading and possibly erroneous error messages. 4) You did not look at the trace results that you printed out very carefully, I do look at the trace, but P::RD is a tough module. That's putting it mildly... The reason for my admonition was simple. The first thing printed out was this: Parse::RecDescent: Treating mmpfile: as a rule declaration Parse::RecDescent: Treating chunk(s) as a one-or-more subrule match Parse::RecDescent: Treating /^\Z/ as a /../ pattern terminal Parse::RecDescent: Treating chunk: as a rule declaration Parse::RecDescent: Treating comment as a subrule match Parse::RecDescent: Treating | as a new production Warning: Undefined (sub)rule comment used in a production. (Hint: Will you be providing this rule later, or did you perhaps misspell comment? Otherwise it will be treated as an immediate reject.) This should have been a Dead Giveaway that something was very wrong. Not only that but it points you directly to the first problem (simply because that particular production with the skip was where PR::D went awry). For example I'm irritated by its rightmost column. Near the top it shows: |c_comment |Didn't match rule | | comment |Didn't match subrule: [c_comment] | | comment |Trying production: [cpp_comment] | | comment | |\nTARGET | | |CNetB.dll\nTARGETTYPE | | |dll\nUID 0x1E5E | | |sdpagent.lib\n\n | comment |Trying subrule: [cpp_comment] | |cpp_commen|Trying rule: [cpp_comment]| |cpp_commen|Trying production: [m{//\s*(.*)}] | |cpp_commen|Trying terminal: [m{//\s*(.*)}] | |cpp_commen|Didn't match terminal | |cpp_commen| |TARGET CNetB.dll\nTARGETTYPE | | |dll\nUID 0x1E5E . | | |sdpagent.lib\n\n Does the rightmost column hold the content of $text? Why does it tell first Trying production and shows \nTARGET, but below it tells Didn't match and shows TARGET without the newline? I would expect it another way around: before the production is tried, the stuff matching $skip is removed isn't it? So it should actually show TARGET, not \nTARGET there. Um the technical term for this is hell if I know. If you are irritated by it now, imagine how irritating it gets when the file you are parsing is HUGE and you get the whole thing for each an every attempt... (I modified the PR::D source to truncate the output because of this.) It takes some time to learn to read the trace dump, but it is well worth the effort. You don't have to like or even completely understand the output for it to be useful. Just watch it progress and when it seems to be misbehaving, study the resulting productions and it will usually come to you quickly. 7) the path delimiter in windoze (you poor soul...) is '\' not '/' I'm porting that mess to Unix :-) So I'll leave the / there. That's great, but your test case had windoze filenames in it... assignment: keyword skip: '[ \t]*' value(s) skip: $item[2] { push @::keyword, $item{keyword}; push @::value, join ' ', @{$item{'value'}}; 1; } Is restoring the $skip = '\s*' above really needed? Well, um, actually, no... And finally my biggest problem right now - the keyword value(s) rule is too greedy and consumes the // added on 01.01.2002 comment, as if it were files: 'LIBRARY' = [ 'euser.lib', 'efsrv.lib', 'c32.lib', '//', 'added',
Re: keyword value(s) newline
Oops. I forgot one important thing that I actually did, that I forgot to include in my description. For the sake of completeness, my actual working script is at the very bottom. On Wednesday, May 12, 2004 Ron D. Smith said: There are six problems that you have. 1) you did not read the section on skip very carefully. 2) you do not have balanced delimiters in your parse description 3) the second production in the chunk rule does not eliminate leading newlines 4) You did not look at the trace results that you printed out very carefully, if you had you would have noticed that PR::D was not consuming your *entire* input description. 5) the item hash does not include the modifiers in the name space. 6) /^SOURCE/ is a subset of /^SOURCEPATH/ 7) the path delimiter in windoze (you poor soul...) is '\' not '/' OK, so that's seven, but I didn't expect the Spanish Inquisition. By making these changes I was able to completely parse you test case. On Wednesday, May 12, 2004 [EMAIL PROTECTED] said: Sorry, small typo in my mail - the startrule is actually called mmpfile in my script. So the (non-working) script is: -Original Message- From: ext [mailto:[EMAIL PROTECTED] $parser = Parse::RecDescent-new(q( mmpfile: chunk(s) /^\Z/ chunk: comment | skip: /[ \t]*/ assignment | error chunk: comment | assignment | error comment: c_comment | cpp_comment cpp_comment: m{//([^\n]*)} { push @::cpp_comment, $1; 1; } c_comment: m{/[*](.*?)[*]/}s { push @::c_comment, $1; 1; } assignment: keyword value(s) /\n/ { assignment: keyword skip: '[ \t]*' value(s) /\n/ { #---^ ^--- assignment: keyword skip: '[ \t]*' value(s) skip: $item[2] { push @::keyword, $item{keyword}; push @::value, join ' ', @{$item{'value(s)'}; push @::value, join ' ', @{$item{'value'}}; 1; } value: file | type | uid file: m{[\w\\/.-]+} file: m{[\w\\.-]+} type: /APP/i | /DLL/i uid: /0x[0-9A-F]+/i keyword: /^AIF/im | /^DOCUMENT/im | /^LANG/im | /^LIBRARY/im | /^RESOURCE/im | /^SOURCEPATH/im | /^SOURCE/im | /^SYSTEMINCLUDE/im | /^TARGETPATH/im | /^TARGETTYPE/im | /^TARGET/im | /^UID/im | /^USERINCLUDE/im )) or die 'Bad grammar'; $text .= $_ while (DATA); defined $parser-mmpfile($text) or die 'bad text'; __DATA__ TARGETCNetB.dll TARGETTYPEdll UID 0x1e5e 0x102F43DB SOURCEPATH..\NetBSrc SOURCECNetB.cpp CNetBSerialBase.cpp CNetBBluetoothModule.cpp SOURCECSnakeActiveWrapper.cpp USERINCLUDE ..\NetBInc SYSTEMINCLUDE \Epoc32\include \Epoc32\include\oem LIBRARY euser.lib efsrv.lib c32.lib // added on 01.01.2002 LIBRARY esock.lib bluetooth.lib btdevice.lib btmanclient.lib LIBRARY btextnotifiers.lib sdpagent.lib /* START WINS BASEADDRESS 0x4620 END #if ( (defined ( WINS ) ) || ( defined (WINSCW) ) ) SOURCEPATH ..\SwImp\src SOURCE CApiCamSpecsImpSw.cpp #else SOURCEPATH ..\Mirage1\src SOURCE CApiCamHandlerImpMirage1.cpp #endif */ And the error message is here (for some reason keyword doesn't match): |assignment|Trying subrule: [keyword] | |assignment|Didn't match subrule: [keyword] | -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 use strict; use vars qw($parser $text @c_comment @cpp_comment @keyword @value); use Parse::RecDescent; $RD_WARN=1; $RD_HINT=1; $RD_TRACE = 1; $parser = Parse::RecDescent-new(q( file: chunk(s) /^\Z/ chunk: comment |assignment | error comment: c_comment | cpp_comment cpp_comment: m{//([^\n]*)} { push @::cpp_comment, $1; 1; } c_comment: m{/[*](.*?)[*]/}s { push @::c_comment, $1; 1; } assignment: keyword skip: '[ \t]*' value(s) skip: $item[2] { push @::keyword, $item{keyword}; push @::value, join ' ', @{$item{'value'}}; 1; } value: file | type | uid file: m{[\w.-]+} type: /APP/i | /DLL/i uid: /0x[0-9A-F]+/i keyword: /^AIF/im | /^DOCUMENT/im | /^LANG/im | /^LIBRARY/im | /^RESOURCE/im | /^SOURCEPATH/im | /^SOURCE/im | /^SYSTEMINCLUDE/im | /^TARGETPATH/im | /^TARGETTYPE/im | /^TARGET/im | /^UID/im | /^USERINCLUDE/im )) or die 'Bad grammar'; $text .= $_ while (DATA); defined $parser-file($text) or die 'bad text'; __DATA__ TARGET CNetB.dll TARGETTYPE dll UID 0x1e5e 0x102F43DB SOURCEPATH ..\NetBSrc SOURCE
Re: Subrule ordering
On Friday, Jun 13, 2003 Richard Jelinek said: Hi Descendants od Rec. Subrule/production ordering. I just don't get it. given this grammar return new Parse::RecDescent (q{ meaning: proplist { if(length($text)) { print $text remains unparsed.\n; return undef; } 1; } | { return undef; } property:phrase '(' proplist ')' | '(' proplist ')' | '~' property | phrase proplist:property xor_prop(s?) | property and_prop(s?) xor_prop:'|' property and_prop:',' property phrase: /[^\(\)\=\,\|\\~]+/ }); when I try to parse 'SYN(d(e|f))' everything goes well. If the string is 'SYN(d(e,f))', a syntax error is spilled out. If I swap the first two subrules/productions of proplist, the situation is vice versa. Ordering is how you establish priority in ambiguous situations. If you turn on $::RD_TRACE=100 you will see what the parser is doing. If you follow the second SYN(d(e,f)) case you will see that the parser is correct, the grammar you describe does not parse that input. Its not a problem with the order, its a fundamental problem with your grammar. So its not that you do not understand the ordering rules, its that you do not understand your grammar. This successfully parses both, but I'm, not sure its what you want. $::RD_TRACE=100; $it= new Parse::RecDescent (q{ meaning: proplist { if(length($text)) { print $text remains unparsed.\n; return undef; } 1; } | { return undef; } property:phrase '(' proplist ')' | '(' proplist ')' | '~' property | phrase proplist:property xorand_prop(s?) xorand_prop: xor_prop|and_prop xor_prop:'|' property and_prop:',' property phrase: /[^\(\)\=\,\|\\~]+/ }); $it-meaning('SYN(d(e|f))'); $it-meaning('SYN(d(e,f))'); I found nowhere in the Parse::RecDescent docs, that the ordering of productions does matter. But it seems it does. If there isn't something blatantly evident I've overseen this makes writing and maintenance of these grammars harder than it could be. -- best regards, Dipl.-Inf. Richard Jelinek - The PetaMem Group - Prague/Nuremberg - www.petamem.com - -= 2325182 Mind Units =- -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226 -- Intel, Corp. 5000 W. Chandler Blvd. Chandler, AZ 85226
Re: binary shifts in p::rd exe block
On Monday, Jun 9, 2003 h. w. neff said: ok, i now have a much pared down example that exhibits the problem -- at least with attempts to use '' in an exe block. blessedly snipped Wow. If that was pared down I would hate to see the whole script :-) This one is kind of subtle, and reminds me of the quote nothing can parse perl except perl. The problem is in Text::Balanced::_match_quotelike where it thinks the '' is a hereis operator instead of a shift and tries to process it like a quoting operator. Since nothing follows the '' that really corresponds to a label, it looks for instead. See Text::Balanced::_match_quotelike starting at line 714: if ($op eq '') { $ld1pos = pos($$textref); my $label; if ($$textref =~ m{\G([A-Za-z_]\w*)}gc) { $label = $1; } elsif ($$textref =~ m{ \G ' ([^'\\]* (?:\\.[^'\\]*)*) ' | \G ([^\\]* (?:\\.[^\\]*)*) | \G ` ([^`\\]* (?:\\.[^`\\]*)*) ` }gcx) { $label = $+; } else { $label = ; } What happens is that $label is being set to null so the quote terminator is looking for a BLANK LINE. Here is a much simpler test case which fails. use Parse::RecDescent; use Data::Dumper; $::RD_ERRORS = 1; # Make sure the parser dies when it encounters an error $::RD_WARN = 1; # Enable warnings. This will warn on unused rules c. $::RD_HINT = 1; # Give out hints to help fix problems. $::RD_TRACE=100; $mach_grammar = q { acu_config_cmd : ';' { $main::Instruction_lower = 0x00 | (($item[16] 0x03)10) ; 1; } }; my $mach_parser = new Parse::RecDescent ($mach_grammar); Where it outputs the following: Parse::RecDescent: Treating acu_config_cmd : as a rule declaration Parse::RecDescent: Treating ; as a literal terminal printing code (3612) to RD_TRACE But this works: use Parse::RecDescent; use Data::Dumper; $::RD_ERRORS = 1; # Make sure the parser dies when it encounters an error $::RD_WARN = 1; # Enable warnings. This will warn on unused rules c. $::RD_HINT = 1; # Give out hints to help fix problems. $::RD_TRACE=100; $mach_grammar = q { acu_config_cmd : ';' { $main::Instruction_lower = 0x00 | (($item[16] 0x03)10) ; 1; } }; my $mach_parser = new Parse::RecDescent ($mach_grammar); Now PAY ATTENTION: The only difference between the two cases is the *BLANK* line after the expression with the '' in it Yet this works and correctly picks up the action: Parse::RecDescent: Treating acu_config_cmd : as a rule declaration Parse::RecDescent: Treating ; as a literal terminal Parse::RecDescent: Treating { $main::Instruction_lower = 0x00 | (($item[16] 0x03)10) ; 1; } as an action printing code (4229) to RD_TRACE The reason for the bug is that it consumes the entire input and the parser fails in a bizarre way that makes it think everything after the '' is part of that hereis literal. So to fake it out, give it what it wants, which is a blank line after every usage of the '' shift operator, but you have to do it INSIDE the action. The original code had the blank line after the action and the parser parser looses it mind. So with your original code, change it to this: acu_config_cmd : bus0_p_see_clause { $main::LastSuccess = a bus0 'ptr see clause'; $main::LookingFor = a comma and a bus0 'update clause'; } bus0_update_clause { $main::LastSuccess = a bus0 'update clause'; $main::LookingFor = a comma'; } ',' { $main::LastSuccess = a comma; $main::LookingFor = a bus1 'ptr see clause'; } bus1_p_see_clause commit { $main::LastSuccess = a bus1 'ptr see clause'; $main::LookingFor = a comma and a bus1 'update clause'; } bus1_update_clause { $main::LastSuccess = a bus1 'update clause';
Re: binary shifts in p::rd exe block
On Thursday, Jun 5, 2003 h. w. neff said: hi. in an attempt to shift merge returned results, i find i cannot use '' or '' in a p::rd executable block; i must instead divide and multiply, respectively. e.g. in this code, 'goodrule' works, 'badrule' does not: sub_rule : /[0-7]/ goodrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';' { $main::machineword = $item[3] * 8 | $item[7]; 1 } badrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';' { $main::machineword = $item[3] 3 | $item[7]; 1 } by 'not working' i mean i get undefined sub-rules and such as if i was getting file/stream type redirection instead of the shifts i want. I suspect you have not correctly analyzed the problem. Here is my test case: #!/usr/intel/pkgs/perl/5.005_03/bin/perl BEGIN {$::RD_HINT=1; $::RD_TRACE=100;} use lib my version of 1.80 of P::RD; use Parse::RecDescent; my $grammar = q { sub_rule : /[0-7]/ goodrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';' { $main::machineword = $item[3] * 8 | $item[7]; print item 3 $item[3] $item[7] $main::machineword\n; 1 } badrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';' { $main::machineword = $item[3] 3 | $item[7]; print item 3 $item[3] $item[7] $main::machineword\n; 1 } }; my $parserRef = new Parse::RecDescent($grammar); print Good rule returns:,$parserRef-goodrule(reg1 = 1 , reg2 = 2;), '$main::machineword' \n; print Bad rule returns: ,$parserRef-badrule(reg1 = 1 , reg2 = 2;), '$main::machineword'\n; exit; I get the following: unix test.pl Parse::RecDescent: Treating sub_rule : as a rule declaration Parse::RecDescent: Treating /[0-7]/ as a /../ pattern terminal Parse::RecDescent: Treating goodrule : as a rule declaration Parse::RecDescent: Treating /reg1/i as a /../ pattern terminal Parse::RecDescent: Treating = as a literal terminal Parse::RecDescent: Treating sub_rule as a subrule match Parse::RecDescent: Treating , as a literal terminal Parse::RecDescent: Treating /reg2/i as a /../ pattern terminal Parse::RecDescent: Treating = as a literal terminal Parse::RecDescent: Treating sub_rule as a subrule match Parse::RecDescent: Treating ; as a literal terminal Parse::RecDescent: Treating { $main::machineword = $item[3] * 8 | $item[7]; print item 3 $item[3] $item[7] $main::machineword\n; 1 } as an action Parse::RecDescent: Treating badrule : as a rule declaration Parse::RecDescent: Treating /reg1/i as a /../ pattern terminal Parse::RecDescent: Treating = as a literal terminal Parse::RecDescent: Treating sub_rule as a subrule match Parse::RecDescent: Treating , as a literal terminal Parse::RecDescent: Treating /reg2/i as a /../ pattern terminal Parse::RecDescent: Treating = as a literal terminal Parse::RecDescent: Treating sub_rule as a subrule match Parse::RecDescent: Treating ; as a literal terminal Parse::RecDescent: Treating { $main::machineword = $item[3] 3 | $item[7]; print item 3 $item[3] $item[7] $main::machineword\n; 1 } as an action printing code (21354) to RD_TRACE | goodrule |Trying rule: [goodrule] | | goodrule | |reg1 = 1 , reg2 = 2; | goodrule |Trying production: [/reg1/i '=' sub_rule | ||',' /reg2/i '=' sub_rule ';'] | | goodrule |Trying terminal: [/reg1/i]| | goodrule |Matched terminal (return value: | ||[reg1]) | | goodrule | | = 1 , reg2 = 2; | goodrule |Trying terminal: ['=']| | goodrule |Matched terminal (return value: [=]) | | goodrule | | 1 , reg2 = 2; | goodrule |Trying subrule: [sub_rule]| | sub_rule |Trying rule: [sub_rule] | | sub_rule |Trying production: [/[0-7]/] | | sub_rule |Trying terminal: [/[0-7]/]| | sub_rule |Matched terminal (return value: [1]) | | sub_rule | | , reg2 = 2; | sub_rule |Matched production: [/[0-7]/] | | sub_rule |Matched rule (return value: [1]) | | sub_rule |(consumed: [ 1]) | | goodrule |Matched subrule: [sub_rule] (return | ||value: [1]| | goodrule |Trying terminal: [',']| | goodrule |Matched terminal (return value: [,]) | | goodrule | | reg2 = 2; | goodrule |Trying terminal: [/reg2/i]| | goodrule |Matched terminal (return value: | ||[reg2]) |