Re: help: defining grammer?

2004-10-26 Thread Ron D. Smith
What you want to do is hard as a generic solution and you will find lots 
and lots of literature on program translation.

That being said, you should break this up into a two step process, step one 
is to parse the generic language (which looks an *awful* lot like perl... 
:-) into a data structure which encapsulates a generic view of the 
constructs.  A second step takes that data structure and produces the 
appropriate output.

For example, $var = 10

assignment:

lhs = expression

lhs:

scalar | array | hash

scalar:

/\$[\w]+/

expression:

{whole expression tree}

...

atom

...

atom:

constant
...

constant:

string_constant | integer

integer:

/[-+]?\d+/

Once you have the data structure, you can look at it and to produce scheme, 
the program will see that you are doing an assign of a variable to a constant 
and emit the Define var1 = 10 string etc.

This really is two problems and each one should be though out separately.
On Monday, Oct 25, 2004 Prentice, Phillip R said:

 Hello,
 
 =20
 
 I was wondering if anyone could help me begin to write some grammer for
 a conversion project.  I basically have a generic language where
 variables, arrays, and hash's are defined.  I want these variables to be
 translated to a tool specific datastructure, i.e TCL, scheme, etc.  I am
 assuming I would have to create a different set of grammar(s) for each
 language I want the variables translated to.  However, I am having
 trouble defining the grammar for parsing the genericLanguange.  Could
 you help me get started by showing me how I would go about parsing the
 below 3 structures in the GenericLangauge section below.  I would
 appreciate any insight or suggestions.
 
 =20
 
 Thanks in advance,
 
 -Phillip
 
 =20
 
 e.g. for example
 
 =20
 
 __GenericLanguage__
 
 =20
 
 $var =3D 10
 
 $var1 =3D variable1
 
 @var2 =3D [1,2,3]
 
 %var3 =3D {'key1'=3D'value1', 'key2'=3D'value2}
 
 =20
 
 __Translate-to-SchemeLanguage__
 
 =20
 
 Define var1 =3D variable1
 
 Define var2 =3D '(1 2 3)
 
 ...
 
 =20
 
 __Translate-to-TCLLanguage__
 
 =20
 
 Set var1 variable
 
 =20
 
 
 --_=_NextPart_001_01C4BAE7.C2C3FB0C--

--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: evaluating an expression while parsing

2004-07-15 Thread Ron D. Smith
On Thursday, Jul 15, 2004 Ted Zlatanov said:

 On Wed, 14 Jul 2004, [EMAIL PROTECTED] wrote:
 
  I think it is better to create an executable and then execute it.
 
 Isn't this a lot harder to debug?  It seems like dumping the contents
 of a data structure is a lot easier than dumping subroutines.  With
 the high rate of P::RD initial grammar bugs, at least in my
 experience, this is an important consideration.
 
 Ted

Debugging *is* an issue, but its a small one.  I find it helpful to first 
focus on the parsing itself (i.e. just get the *grammar* right) and then go 
back and retrofit the actual application.

As far as debugging, you can still debug this using the perl debugger just 
like you debug anything else.  The only difference is that it is a bit more 
challenging to insert a breakpoint.  To insert a breakpoint in some bit of 
buried subroutine, simply insert the following statement:

$DB::single=1;

and voila, you will break there with all of the access you normally have in 
the debugger.

IMHO, if you are looking for ease of debug, you need to avoid PR::D 
altogether... ;-)

--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: rebuilding a string from the autotree datastructure.

2004-06-15 Thread Ron D. Smith
On Tuesday, Jun 15, 2004 David Holden said:
 Hello,
 
  I would like to rebuild a parsed string from the datastructure given by 
 autotree.
 
 For example
 
 
 Given the date string (2001).
 
 
 using the following rules:-
 
Date: lft_bracket Year YearLabel(?) rgt_bracket point
 
Year: /\d{4}/
YearLabel: /[a-z]/
point: '.'
lft_bracket: '('
rgt_bracket: ')'
 
 
 autotree gives the following datastructure
 
 $VAR1 = bless( {
  'Year' = bless( {
 '__VALUE__' = '2001'
   }, 'Year' ),
  '__RULE__' = 'Date',
  'point' = bless( {
  '__VALUE__' = '.'
}, 'point' ),
  'lft_bracket' = bless( {
'__VALUE__' = '('
  }, 'lft_bracket' ),
  'rgt_bracket' = bless( {
'__VALUE__' = ')'
  }, 'rgt_bracket' ),
  'YearLabel(?)' = []
}, 'Date' );
 
 I would like to be able to reconstruct the string (2001). from this 
 structure the problem I have is that because it is a HASH it does not have 
 order information, e.g. in the above Year key comes before lft_bracket.
 
 Am I missing somethings?

No.

Basically what you want to do is unparse the data.  The unparser will 
need to know as much about the syntax as the parser.

autotree gives you a best guess as to what you need, but if you want to 
have a structure that contains ordering information, you will need to take 
control of the structure that gets created.

To do so, you do not need to abandon autotree:

use strict;
use vars qw($parser $text %top);
use Data::Dumper;
use Parse::RecDescent;
$RD_WARN  = 1;
$RD_HINT  = 1;
$RD_TRACE = 120;
use constant GRAMMAR = q(

autotree
Date: lft_bracket Year YearLabel(?) rgt_bracket point [EMAIL PROTECTED]

   Year: /\d{4}/
   YearLabel: /[a-z]/
   point: '.'
   lft_bracket: '('
   rgt_bracket: ')'


);

$parser = Parse::RecDescent-new(GRAMMAR) or die 'Bad grammar';
$text .= $_ while (DATA);
print Dumper($parser-Date($text));

__DATA__
(2001).


 
  thanks in advance.
 
 Dave.
 
 
 
 
 
 
 -- 
 Dr. David Holden.
 
 Thanks in advance:-
 Please avoid sending me Word or PowerPoint attachments.
 See: http://www.fsf.org/philosophy/no-word-attachments.html
 
 Show me your papers..: http://www.no2id.net/index.html
 
 Public GPG key available on request.
 -


--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: Parsing comments again (cpp linemarkers)

2004-06-03 Thread Ron D. Smith
I do not have an easy answer for you, perhaps one of the Wizards will.

On Thursday, Jun 3, 2004 [EMAIL PROTECTED] said:

 Hi,
 
 I'm almost finished with the script (the stripped down test case 
 is on the bottom of this mail) which parses cpp-preprocessed output.
 
 The pecularity of parsing cpp-output is that if the cpp-processed
 file #includes a file, then cpp prints a linemarker line, like:
 
   # 1 ..\\mmstorageiface\\group\\bld.inf 1

So, in other words, this can appear at the top-most parsing level.

 
 In my script I keep track of the current working directory and
 update it (in the updateCurdir() below) whenever a linemarker
 rule (with flags 1 or 2) triggers. The script works fine for
 the most of my input files, but there are few that break it.
 
 There are 5 kind of sections started by the section terminal.
 And the problem comes when the input file #includes another
 file, while being in the middle of one section, like here:
 
   PRJ_MMPFILES
   MMSTORAGEIFACE.MMP
   ..\GROUP\PNMSG.MMP
   # 72 bld.inf 2
   makefile ..\group\convcolor.mk

So in other words, this can *also*  appear INSIDE a subordinate parsing level.

H...

 
 The section above is started by the PRJ_MMPFILES terminal,
 then 2 paths to .mmp files are coming and then (unexpected)
 the cpp-linemarker comes before the last path.
 
 What could I do now to solve this problem, please?

Seems to me like there is no obvious solution except to put it in two 
places :-).

 
 I can't just skip the linemarker comments as in P::RD::FAQ.
 
 I also don't want to prepend the linemarker subrule to each
 of the 5 section rules, like here:
 
   prj_mmpfile: /PRJ_MMPFILES\b/i 
   (linemarker | mmpfile[\%::prj_mmpfiles])(s?)

OK, so then put the rule in mmpfile, it'll catch all five cases...

mmpfile: makefile(?) path special(?) {
::storeMmpPath($arg[0], $item[1]-[0], $item{path}, $item[-1]-[0]);
} | linemarker

 
 because that would make my script really unreadable and

Unreadability is in the eye of the beholder.  I like using white space and 
alignment to make things readable, but that is just MHO.

 because I'll have to add an if-statement into each of the
 5 actions in order to distinguish if it was triggered by
 another linemarker or by the normal section entry.

Yep, sounds like you will, because the semantics subtly change between the 
two cases and this sounds like a pathologic thing to me based on the 
semantics of the input data.

 
 Any advices please?

Since I'm talking and I can't shut up, I have to offer my opinion about your 
coding style wrt(1) side effects.

Inside of parsing rules, you update your internal tables early (i.e. the 
chdir rule triggers an update as soon as its encountered).  This can be fine 
or it can cause unfortunate side effects which become parse dependent, 
particularly if you use negative look-ahead.  There is *bound* to be one 
single corner case in some friggin' windoze file somewhere that will cause 
you to spend hours in debug before you end up slapping your forehead and 
shouting Doh!.

I would advise that all side effects be implemented at the highest possible 
level i.e. when you are Real Sure that you have the right data.  At that 
point it is safe to update the tables.  You are not making your own data 
structure, which is OK, but it can make it somewhat complicated when you go 
back to make some kind of data dependent update.  I would recommend that all 
side effects be implemented in the innfile rule because it is the only place 
that you are Real Sure of what you have.

Also another coding trick that makes it easy to maintain code is to have 
keyword rules programmable:

use constant GRAMMAR = q(
{my @makefile_keywords=qw(GNUMAKEFILE MAKEFILE NMAKEFILE);

...

makefile: /\w+\b/ 

{
  if (grep(lc($item[1]) eq $_,@makefile_keywords)) {
$item[1]
  } else {
undef
  }
}

...

This way when you discover that you have to add Just One More Keyword, you 
just put it in the appropriate list.

(1) wrt: With Respect To - at Intel we love TLAs - Three Letter Acronyms

 
 Regards
 Alex
 
 

--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: grammar problems

2004-05-21 Thread Ron D. Smith
Its massively helpful if you provide a complete (and working! - i.e. 
compiles) example.

I would encourage you to go back to the PR::D POD and follow the examples for 
this kind of thing VERY CLOSELY.

The (a) problem is that ident does not disallow keywords:

use strict;
use vars qw($parser $text %top);
use Data::Dumper;
use Parse::RecDescent;
$RD_WARN  = 1;
$RD_HINT  = 1;
$RD_TRACE = 120;
my $grammar = q
{
autotree
disj  : conj disjOp disj | conj
conj  : term conjOp(?) conj | term
term  : qualif(?) term2
term2 : brack | phrase | ident
brack : '(' disj ')'
phrase: '' ident(s?) ''
ident : /[a-zA-Z\d]+/ {$return=($item[1]=~/^(or|and)$/i)?undef:$item[1]}
qualif: ident '='
conjOp: /AND/i #| ''
disjOp: /OR/i
};
$parser = Parse::RecDescent-new($grammar);
print Dumper($parser-disj('a or b'));


This turns out to be critical because of your requirement that the keyword 
operator and be optional.  This works by failing the rule in the exception 
case.  Another way to do it is to use the lookahead feature.

On Friday, May 21, 2004 Jonas Wolf said:
 Following your recommendation, I have altered the query to read as 
 follows:
 
 my $grammar = q
 {
 autotree
 disj  : conj disjOp disj | conj
 conj  : term conjOp(?) conj | term
 term  : qualif(?) term2
 term2 : brack | phrase | ident
 brack : '(' disj ')'
 phrase: '' ident(s?) ''
 ident : /[a-zA-Z0-9]+/i
 qualif: ident '='
 conjOp: /AND/i #| ''
 disjOp: /OR/i
 };
 
 This should fix the precedence issue with 'qualif'. However, this shows 
 the same misbehaviour
 as when I tried it. Again, a query 'a or b' is interpreted as 'a AND or 
 AND b'.
 
 Any ideas?
 
 Jonas
 
 
 
 
 
 Jonas Wolf/UK/[EMAIL PROTECTED]
 21-05-04 09:16
  
 To: Ron D. Smith [EMAIL PROTECTED]
 cc: [EMAIL PROTECTED]
 Subject:Re: grammar problems
 
 
 I would like to parse boolean queries like the following:
 = means the two queries represent the same structure
 
 this and that = this that
 this or that
 a b or c d = (a b) or (c d) = (a and b) or (c and d)
 this phrase word = this phrase and word
 abstract = error or content = mistake = (abstract = error) or (content = 
 
 mistake)
 ...
 
 I hope that clarifies things.
 
  conj  : term conjOp(?) conj | term
 
 Yes, that seems like a good alternative, I will have to try if it works.
 
 Thanks, Jonas
 
 
  Hello,
  
  I recently downloaded the Parse::RecDescent package to parse
  boolean queries. I have several questions, not necessarily about
  the package itself, but rather about my grammar. 
  
  I am currently using the following grammar:
  
  my $grammar = q
  {
  autotree
  disj  : qualif(?) conj disjRec(s?)
   disjRec  : disjOp conj
  conj  : term conjRec(s?)
   conjRec  : conjOp term
  term  : brack | phrase | ident
  brack : '(' disj ')'
  phrase: '' ident(s?) ''
  ident : /[a-zA-Z0-9]+/i
  qualif: ident '='
  conjOp: /AND/i
  disjOp: /OR/i
  };
  
  I have several problems with this.
  
  Firstly, the precedence isn't quite what I want in 'qualif'. I would
  like 'ident' to bind tightly to whatever comes next to it. Currently
  it seems to associate it with the whole 'disj' that comes after.
 
 You do not provide any examples of what you are trying to parse, which 
 would 
 help.  IMHO what you mean by the above paragraph is not clear so I don't 
 know 
 what you want.
 
  
  Secondly, I would like the 'conjOp' operator to be optional, and that
  the parser recognises this. This means a query for 'a b' would be
  interpreted as 'a AND b'. I tried replacing the conjOp rule with
  'conjOp : /AND/i | ', but this does not work, as now a query 
  'a AND b' is interpreted as 'a AND and ...'.
 
 Perhaps:
 
 conj  : term conjOp(?) conj | term
 
 It might help if you followed the example given in the PR::D POD more 
 closely.
 
  
  Any help would be appreciated.
  
  Thanks, Jonas.
 
 
 --
  Intel, Corp.
  5000 W. Chandler Blvd.
  Chandler, AZ 85226
 
 -- 
  Intel, Corp.
  5000 W. Chandler Blvd.
  Chandler, AZ  85226
 
 
 
 
 


--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: option: /option/i /\w+/ optarg(?) ''

2004-05-20 Thread Ron D. Smith
On Thursday, May 20, 2004 Alexander Farber said:
 Hi,
 
 why doesn't the question mark in the optarg(?)
 below work as I expect? In the trace I see that
 optarg consumes the  character and then the
 option rule fails, since the final  is missing.
 
 Isn't P:RD supposed to backtrack and say 
 ok, the optarg(?) didn't match anything here?

Um, why would it do this?  In fact, optarg *does* match zero or one times, 
it is option that does not match.  This is a simple problem with your 
grammar where you got (exactly...) what you asked for.

As an example, if you change optarg it will be better:

optarg: /[^\s]+/

Also, there is more than one way to do it but I don't like your first rule 
for option /option/ but would prefer '' 'option' as it is easier to 
maintain.  If the problem is that there should not be a space between '' and 
'option' than change skip.

But this is not related to your question, this is just me butting into your 
style... ;-)

 
 Of course the script below works when I change
 the optarg from /\S+/ to /\w+/, but I'm curious,
 why doesn't optarg(?) mean ZERO or one here?
 
 Thank you
 Alex
 
 #!/usr/bin/perl -w
 
 use strict;
 use vars qw($parser $text %option);
 use Data::Dumper;
 use Parse::RecDescent;
 $RD_WARN  = 1;
 $RD_HINT  = 1;
 $RD_TRACE = 120;
 $parser = Parse::RecDescent-new(q(
 
 genfile: chunk(s) /^\Z/
 chunk: option | error
 option: /option/i /\w+/ optarg(?)  {
   push @{$::option{lc $item[2]}}, $item[-2];
 } 
 optarg: /\S+/
 
 )) or die 'Bad grammar';
 $text .= $_ while (DATA);
 defined $parser-genfile($text) or die 'Bad text';
 print STDERR Data::Dumper-Dump([\%option], [qw(option)]);
 __DATA__
 option one
 option two
 


--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: grammar problems

2004-05-20 Thread Ron D. Smith
On Thursday, May 20, 2004 Jonas Wolf said:
 Hello,
 
 I recently downloaded the Parse::RecDescent package to parse
 boolean queries. I have several questions, not necessarily about
 the package itself, but rather about my grammar. 
 
 I am currently using the following grammar:
 
 my $grammar = q
 {
 autotree
 disj  : qualif(?) conj disjRec(s?)
  disjRec  : disjOp conj
 conj  : term conjRec(s?)
  conjRec  : conjOp term
 term  : brack | phrase | ident
 brack : '(' disj ')'
 phrase: '' ident(s?) ''
 ident : /[a-zA-Z0-9]+/i
 qualif: ident '='
 conjOp: /AND/i
 disjOp: /OR/i
 };
 
 I have several problems with this.
 
 Firstly, the precedence isn't quite what I want in 'qualif'. I would
 like 'ident' to bind tightly to whatever comes next to it. Currently
 it seems to associate it with the whole 'disj' that comes after.

You do not provide any examples of what you are trying to parse, which would 
help.  IMHO what you mean by the above paragraph is not clear so I don't know 
what you want.

 
 Secondly, I would like the 'conjOp' operator to be optional, and that
 the parser recognises this. This means a query for 'a b' would be
 interpreted as 'a AND b'. I tried replacing the conjOp rule with
 'conjOp : /AND/i | ', but this does not work, as now a query 
 'a AND b' is interpreted as 'a AND and ...'.

Perhaps:

conj  : term conjOp(?) conj | term

It might help if you followed the example given in the PR::D POD more closely.

 
 Any help would be appreciated.
 
 Thanks, Jonas.


--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: Parsing binary AND/OR expressions, with brackets

2004-05-17 Thread Ron D. Smith
1) you needed to fix the syntax error.  (It was the 'i' modifier on the 
(s//i)
2) you did not define nested parentheses as the highest priority and allow 
for any number of them
3) the skip was *demanding* to be there ('+' instead of '*')

The following produces output but I'm still not sure what you are doing...

use strict;
use vars qw($parser $text %top);
use Data::Dumper;
use Parse::RecDescent;
$RD_WARN  = 1;
$RD_HINT  = 1;
$RD_TRACE = 120;
$parser = Parse::RecDescent-new(q(

mmpfile: chunk(s) /^\Z/
chunk: assignment | ifdef|error

assignment: keyword skip: '[ \t]*' value(s) {
push @{$::top{uc $item{keyword}}}, @{$item[-1]};
}

ifdef: /#\s*if/ skip: '[ \t]*' conjunction
skip: $item[2] chunk(s?)
/#\s*endif/

conjunction: disjunction(s //)
disjunction: unary_expr(s /\|\|/)
unary_expr: '!' defined_expr | defined_expr
defined_expr: 'defined' '(' value ')' | 'defined' value | '(' conjunction ')'
value:  /VAL\d+\b/i
keyword: /KEY\d+\b/i

)) or die 'Bad grammar';
$text .= $_ while (DATA);
defined $parser-mmpfile($text) or die 'Bad text';
print Data::Dumper-Dump([\%top], [qw(top)]);

__DATA__
#if defined val1
key1val2
#endif

#if ! ( (defined ( val3 ) ) || ( defined (val4) ) )
key2val5
#endif

On Monday, May 17, 2004 Alexander Farber said:

 Hi,
 
 could someone please help me few steps further?
 I'm almost finished with the parser I need. 
 
 The only missing part is grokking C-preprocessor-like
 #if defined / #endif expressions, which can be surrounded
 by brackets and contain  and || as binary operators.
 
 Unfortunately my script fails already with 
 some syntax error, which I couldn't fix yet:
 
 Parse::RecDescent: Treating conjunction: as a rule declaration
 
 Warning: Undefined (sub)rule keyword used in a production.
   (Hint: Will you be providing this rule later, or did you
  perhaps misspell keyword? Otherwise it will be
  treated as an immediate reject.)
 
 Warning: Undefined (sub)rule value used in a production.
   (Hint: Will you be providing this rule later, or did you
  perhaps misspell value? Otherwise it will be treated
  as an immediate reject.)
 
 Also, please any suggestions on how to handle the
 nested brackets properly? I keep looking at the
 demo_operator.pl and other P::RD examples but couldn't
 figure it out yet...
 
 What I also don't know yet, is in which datastructure
 to save the binary operators and conditions that I'm
 trying to parse. I'm trying to convert the format below 
 to the GNU make's ifdef/ifndef
 
 Regards
 Alex
 
 
 #!/usr/bin/perl -w
 
 use strict;
 use vars qw($parser $text %top);
 use Data::Dumper;
 use Parse::RecDescent;
 $RD_WARN  = 1;
 $RD_HINT  = 1;
 $RD_TRACE = 120;
 $parser = Parse::RecDescent-new(q(
 
 mmpfile: chunk(s) /^\Z/
 chunk: assignment | error
 
 assignment: keyword skip: '[ \t]+' value(s) {
   push @{$::top{uc $item{keyword}}}, @{$item[-1]};
 }
 
 ifdef: /#\s*if/ skip: '[ \t]+' condition
   skip: $item[2] chunk(s?)
   /#\s*endif/
 
 condition: '(' conjunction ')' | conjunction
 conjunction: disjunction(s //i)
 disjunction: unary_expr(s /\|\|/i)
 unary_expr: '!' defined_expr | defined_expr  
 defined_expr: 'defined' '(' value ')' | 'defined' value
 
 value:/VAL\d+\b/i
 keyword: /KEY\d+\b/i
 
 )) or die 'Bad grammar';
 $text .= $_ while (DATA);
 defined $parser-mmpfile($text) or die 'Bad text';
 print Data::Dumper-Dump([\%top], [qw(top)]);
 
 __DATA__
 #if defined val1
 key1  val2
 #endif
 
 #if ! ( (defined ( val3 ) ) || ( defined (val4) ) )
 key2  val5
 #endif

--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: keyword value(s) newline

2004-05-13 Thread Ron D. Smith
On Thursday, May 13, 2004 [EMAIL PROTECTED] said:

   1) you did not read the section on skip very carefully.
 
 The Skipping between terminals section in the man doesn't tell 
 explicitly that skip: /regex/ is not supported. There is an example
 
   skip: qr/[:,]/
 
 and I had been misleaded by that.

Um, yeah, well, this is a simple issue of perl syntax.  The '/' in the above 
is not a regular expression delimiter but a quote delimiter.

 
   2) you do not have balanced delimiters in your parse description
 
 Could you please tell a bit more, what do you mean here? 
 

Simple.  The original supplied script had this:

push @::value, join ' ', @{$item{'value'};

Notice that this is simply missing a trailing '}' before the ';'.  While this 
seems a simple mistake, I have noticed that PR::D does not tolerate perl 
syntax errors well and will supply misleading and possibly erroneous error 
messages.

   4) You did not look at the trace results that you printed 
  out very carefully, 
 
 I do look at the trace, but P::RD is a tough module. 

That's putting it mildly...  The reason for my admonition was simple.  The 
first thing printed out was this:

Parse::RecDescent: Treating mmpfile: as a rule declaration
Parse::RecDescent: Treating chunk(s) as a one-or-more subrule match
Parse::RecDescent: Treating /^\Z/ as a /../ pattern terminal
Parse::RecDescent: Treating chunk: as a rule declaration
Parse::RecDescent: Treating comment as a subrule match
Parse::RecDescent: Treating | as a new production

  Warning: Undefined (sub)rule comment used in a production.
(Hint: Will you be providing this rule later, or did you
   perhaps misspell comment? Otherwise it will be
   treated as an immediate reject.)

This should have been a Dead Giveaway that something was very wrong.  Not 
only that but it points you directly to the first problem (simply because 
that particular production with the skip was where PR::D went awry).

 For example
 I'm irritated by its rightmost column. Near the top it shows:
 
 |c_comment |Didn't match rule |
 | comment  |Didn't match subrule: [c_comment] |
 | comment  |Trying production: [cpp_comment]  |
 | comment  |  |\nTARGET
 |  |  |CNetB.dll\nTARGETTYPE
 |  |  |dll\nUID 0x1E5E
  
  
 |  |  |sdpagent.lib\n\n
 | comment  |Trying subrule: [cpp_comment] |
 |cpp_commen|Trying rule: [cpp_comment]|
 |cpp_commen|Trying production: [m{//\s*(.*)}] |
 |cpp_commen|Trying terminal: [m{//\s*(.*)}]   |
 |cpp_commen|Didn't match terminal |
 |cpp_commen|  |TARGET CNetB.dll\nTARGETTYPE
 |  |  |dll\nUID 0x1E5E
  
  .
 |  |  |sdpagent.lib\n\n
 
 Does the rightmost column hold the content of $text? Why does it
 tell first Trying production and shows \nTARGET, but below
 it tells Didn't match and shows TARGET without the newline?
 
 I would expect it another way around: before the production 
 is tried, the stuff matching $skip is removed isn't it? 
 So it should actually show TARGET, not \nTARGET there.

Um the technical term for this is hell if I know.  If you are irritated by 
it now, imagine how irritating it gets when the file you are parsing is HUGE 
and you get the whole thing for each an every attempt...  (I modified the 
PR::D source to truncate the output because of this.) It takes some time to 
learn to read the trace dump, but it is well worth the effort.  You don't 
have to like or even completely understand the output for it to be useful.  
Just watch it progress and when it seems to be misbehaving, study the 
resulting productions and it will usually come to you quickly.

   7) the path delimiter in windoze (you poor soul...) is '\' not '/'
 
 I'm porting that mess to Unix :-) So I'll leave the / there.

That's great, but your test case had windoze filenames in it...

 
  assignment: keyword  skip: '[ \t]*' value(s)  skip: $item[2]  {
  push @::keyword, $item{keyword};
  push @::value, join ' ', @{$item{'value'}};
  1;
  }
 
 Is restoring the $skip = '\s*' above really needed?

Well, um, actually, no...

 
 And finally my biggest problem right now - the keyword value(s)
 rule is too greedy and consumes the // added on 01.01.2002 comment, 
 as if it were files:
 
  'LIBRARY' = [
 'euser.lib',
 'efsrv.lib',
 'c32.lib',
 '//',
 'added',
 

Re: keyword value(s) newline

2004-05-12 Thread Ron D. Smith
Oops.  I forgot one important thing that I actually did, that I forgot to 
include in my description.

For the sake of completeness, my actual working script is at the very bottom.


On Wednesday, May 12, 2004 Ron D. Smith said:
 
 There are six problems that you have.
 
 1) you did not read the section on skip very carefully.
 2) you do not have balanced delimiters in your parse description
 3) the second production in the chunk rule does not eliminate leading newlines
 4) You did not look at the trace results that you printed out very carefully, 
 if you had you would have noticed that PR::D was not consuming your *entire* 
 input description.
 5) the item hash does not include the modifiers in the name space.
 6) /^SOURCE/ is a subset of /^SOURCEPATH/
 7) the path delimiter in windoze (you poor soul...) is '\' not '/'
 
 OK, so that's seven, but I didn't expect the Spanish Inquisition.
 
 
 By making these changes I was able to completely parse you test case.
 
 On Wednesday, May 12, 2004 [EMAIL PROTECTED] said:
  Sorry, small typo in my mail - the startrule is actually 
  called mmpfile in my script. So the (non-working) script is:
  
   -Original Message-
   From: ext [mailto:[EMAIL PROTECTED]
   
   $parser = Parse::RecDescent-new(q(
   
   mmpfile: chunk(s) /^\Z/
   
   chunk: comment | skip: /[ \t]*/ assignment | error
 
 chunk: comment | assignment | error
 
 
   
   comment: c_comment | cpp_comment
   
   cpp_comment: m{//([^\n]*)} {  
 push @::cpp_comment, $1;
 1;
   }
   
   c_comment: m{/[*](.*?)[*]/}s {
 push @::c_comment, $1;
 1;
   }
   
   assignment: keyword value(s) /\n/ {
 
  assignment: keyword skip: '[ \t]*'  value(s) /\n/ {
 
 #---^  ^---

assignment: keyword  skip: '[ \t]*' value(s)  skip: $item[2]  {

 
 push @::keyword, $item{keyword};
 push @::value, join ' ', @{$item{'value(s)'};
 
   push @::value, join ' ', @{$item{'value'}};
 
 
 1;
   }
   
   value: file | type | uid
   
   file: m{[\w\\/.-]+}
 
  file: m{[\w\\.-]+}
 
 
   
   type: /APP/i | /DLL/i
   
   uid: /0x[0-9A-F]+/i
   
   keyword: 
 /^AIF/im |
 /^DOCUMENT/im |
 /^LANG/im |
 /^LIBRARY/im |
 /^RESOURCE/im |
 
 
/^SOURCEPATH/im |
 
 
 /^SOURCE/im |
 /^SYSTEMINCLUDE/im |
 /^TARGETPATH/im |
 /^TARGETTYPE/im |
 /^TARGET/im |
 /^UID/im |
 /^USERINCLUDE/im
   
   )) or die 'Bad grammar';
   $text .= $_ while (DATA);
   defined $parser-mmpfile($text) or die 'bad text';
   
   __DATA__
   
   TARGETCNetB.dll
   TARGETTYPEdll
   UID   0x1e5e 0x102F43DB
   
   SOURCEPATH..\NetBSrc 
   SOURCECNetB.cpp  CNetBSerialBase.cpp 
   CNetBBluetoothModule.cpp
   SOURCECSnakeActiveWrapper.cpp
   
   USERINCLUDE   ..\NetBInc
   SYSTEMINCLUDE \Epoc32\include \Epoc32\include\oem
   
   LIBRARY   euser.lib efsrv.lib c32.lib // added on 
   01.01.2002
   LIBRARY   esock.lib bluetooth.lib btdevice.lib 
   btmanclient.lib
   LIBRARY   btextnotifiers.lib sdpagent.lib
   
   /*
   START WINS
   BASEADDRESS 0x4620
   END
   
   #if ( (defined ( WINS ) ) || ( defined (WINSCW) ) )
   SOURCEPATH  ..\SwImp\src
   SOURCE  CApiCamSpecsImpSw.cpp
   #else
   SOURCEPATH  ..\Mirage1\src
   SOURCE  CApiCamHandlerImpMirage1.cpp
   #endif
   */
   
  
  
  And the error message is here (for some reason keyword doesn't match):
  
  |assignment|Trying subrule: [keyword] |
  |assignment|Didn't match subrule: [keyword]   |
 
 
 --
  Intel, Corp.
  5000 W. Chandler Blvd.
  Chandler, AZ 85226
 
 -- 
  Intel, Corp.
  5000 W. Chandler Blvd.
  Chandler, AZ  85226
 

use strict;
use vars qw($parser $text @c_comment @cpp_comment @keyword @value);
use Parse::RecDescent;
$RD_WARN=1;
$RD_HINT=1;
$RD_TRACE = 1;

$parser = Parse::RecDescent-new(q(

file: chunk(s) /^\Z/

chunk: comment |assignment | error

comment: c_comment | cpp_comment

cpp_comment: m{//([^\n]*)} {
push @::cpp_comment, $1;
1;
}

c_comment: m{/[*](.*?)[*]/}s {
push @::c_comment, $1;
1;
}

assignment: keyword  skip: '[ \t]*' value(s)  skip: $item[2]  {
push @::keyword, $item{keyword};
push @::value, join ' ', @{$item{'value'}};
1;
}

value: file | type | uid

file: m{[\w.-]+}

type: /APP/i | /DLL/i

uid: /0x[0-9A-F]+/i

keyword: 
/^AIF/im |
/^DOCUMENT/im |
/^LANG/im |
/^LIBRARY/im |
/^RESOURCE/im |
/^SOURCEPATH/im |
/^SOURCE/im |
/^SYSTEMINCLUDE/im |
/^TARGETPATH/im |
/^TARGETTYPE/im |
/^TARGET/im |
/^UID/im |
/^USERINCLUDE/im

)) or die 'Bad grammar';
$text .= $_ while (DATA);
defined $parser-file($text) or die 'bad text';

__DATA__

TARGET  CNetB.dll
TARGETTYPE  dll
UID 0x1e5e 0x102F43DB

SOURCEPATH  ..\NetBSrc 
SOURCE

Re: Subrule ordering

2003-06-13 Thread Ron D. Smith
On Friday, Jun 13, 2003 Richard Jelinek said:

 Hi Descendants od Rec.
 
 Subrule/production ordering. I just don't get it.
 
 given this grammar
 
 return new Parse::RecDescent (q{
meaning: proplist
{ if(length($text)) {
   print $text remains unparsed.\n;
   return undef;
  }
  1; }
  | { return undef; }
 
   property:phrase '(' proplist ')'
  | '(' proplist ')'
  | '~' property
  | phrase
 
 
   proplist:property xor_prop(s?)
  | property and_prop(s?)
 
   xor_prop:'|' property
   and_prop:',' property
 
 
   phrase:  /[^\(\)\=\,\|\\~]+/
 
 });
 
 when I try to parse 'SYN(d(e|f))'
 
 everything goes well. If the string is 'SYN(d(e,f))', a syntax error
 is spilled out. If I swap the first two subrules/productions of
 proplist, the situation is vice versa.

Ordering is how you establish priority in ambiguous situations.  If you turn 
on $::RD_TRACE=100 you will see what the parser is doing.  If you follow the 
second SYN(d(e,f)) case you will see that the parser is correct, the grammar 
you describe does not parse that input.  Its not a problem with the order, 
its a fundamental problem with your grammar.  So its not that you do not 
understand the ordering rules, its that you do not understand your grammar.

This successfully parses both, but I'm, not sure its what you want.

$::RD_TRACE=100;

$it= new Parse::RecDescent (q{
   meaning: proplist
   { if(length($text)) {
  print $text remains unparsed.\n;
  return undef;
 }
 1; }
 | { return undef; }

  property:phrase '(' proplist ')'
 | '(' proplist ')'
 | '~' property
 | phrase


  proplist:property xorand_prop(s?)

  xorand_prop: xor_prop|and_prop
  xor_prop:'|' property
  and_prop:',' property


  phrase:  /[^\(\)\=\,\|\\~]+/

});

$it-meaning('SYN(d(e|f))');
$it-meaning('SYN(d(e,f))');

 
 I found nowhere in the Parse::RecDescent docs, that the ordering of
 productions does matter. But it seems it does. If there isn't
 something blatantly evident I've overseen this makes writing and
 maintenance of these grammars harder than it could be.
 
 -- 
 best regards,
 
  Dipl.-Inf. Richard Jelinek
 
  - The PetaMem Group - Prague/Nuremberg - www.petamem.com -
  -= 2325182 Mind Units =-

--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226




Re: binary shifts in p::rd exe block

2003-06-10 Thread Ron D. Smith
On Monday, Jun 9, 2003 h. w. neff said:
 ok, i now have a much pared down example that exhibits the
problem -- at least with attempts to use '' in an exe
block.

blessedly snipped

Wow.  If that was pared down I would hate to see the whole script :-)

This one is kind of subtle, and reminds me of the quote nothing can parse 
perl except perl.

The problem is in Text::Balanced::_match_quotelike where it thinks the '' 
is a hereis operator instead of a shift and tries to process it like a 
quoting operator.  Since nothing follows the '' that really corresponds to 
a label, it looks for  instead.  See Text::Balanced::_match_quotelike 
starting at line 714:

if ($op eq '') {
$ld1pos = pos($$textref);
my $label;
if ($$textref =~ m{\G([A-Za-z_]\w*)}gc) {
$label = $1;
}
elsif ($$textref =~ m{ \G ' ([^'\\]* (?:\\.[^'\\]*)*) '
 | \G  ([^\\]* (?:\\.[^\\]*)*) 
 | \G ` ([^`\\]* (?:\\.[^`\\]*)*) `
 }gcx) {
$label = $+;
}
else {
$label = ;
}


What happens is that $label is being set to null so the quote terminator is 
looking for a BLANK LINE.

Here is a much simpler test case which fails.

use Parse::RecDescent;
use Data::Dumper;

$::RD_ERRORS = 1; # Make sure the parser dies when it encounters an error
$::RD_WARN   = 1; # Enable warnings. This will warn on unused rules c.
$::RD_HINT   = 1; # Give out hints to help fix problems.

$::RD_TRACE=100;



$mach_grammar = q
{
acu_config_cmd :
';'
{
$main::Instruction_lower = 0x00
| (($item[16]  0x03)10)
;
1;
}

};

my $mach_parser = new Parse::RecDescent ($mach_grammar); 


Where it outputs the following:

Parse::RecDescent: Treating acu_config_cmd : as a rule declaration
Parse::RecDescent: Treating ; as a literal terminal
printing code (3612) to RD_TRACE

But this works:

use Parse::RecDescent;
use Data::Dumper;

$::RD_ERRORS = 1; # Make sure the parser dies when it encounters an error
$::RD_WARN   = 1; # Enable warnings. This will warn on unused rules c.
$::RD_HINT   = 1; # Give out hints to help fix problems.

$::RD_TRACE=100;



$mach_grammar = q
{
acu_config_cmd :
';'
{
$main::Instruction_lower = 0x00
| (($item[16]  0x03)10)
;

1;
}

};

my $mach_parser = new Parse::RecDescent ($mach_grammar); 


Now PAY ATTENTION:  The only difference between the two cases is the *BLANK* 
line after the expression with the '' in it

Yet this works and correctly picks up the action:

Parse::RecDescent: Treating acu_config_cmd : as a rule declaration
Parse::RecDescent: Treating ; as a literal terminal
Parse::RecDescent: Treating { $main::Instruction_lower = 0x00 |
   (($item[16]  0x03)10) ; 1; } as an action
printing code (4229) to RD_TRACE


The reason for the bug is that it consumes the entire input and the parser 
fails in a bizarre way that makes it think everything after the '' is part 
of that hereis literal.

So to fake it out, give it what it wants, which is a blank line after every 
usage of the '' shift operator, but you have to do it INSIDE the action.  
The original code had the blank line after the action and the parser parser 
looses it mind.

So with your original code, change it to this:

acu_config_cmd :
bus0_p_see_clause
{
$main::LastSuccess = a bus0 'ptr see clause';
$main::LookingFor = a comma and a bus0 'update 
clause';
}
bus0_update_clause
{
$main::LastSuccess = a bus0 'update clause';
$main::LookingFor = a comma';
}
','
{
$main::LastSuccess = a comma;
$main::LookingFor = a bus1 'ptr see clause';
}
bus1_p_see_clause commit
{
$main::LastSuccess = a bus1 'ptr see clause';
$main::LookingFor = a comma and a bus1 'update 
clause';
}
bus1_update_clause
{
$main::LastSuccess = a bus1 'update clause';
   

Re: binary shifts in p::rd exe block

2003-06-06 Thread Ron D. Smith
On Thursday, Jun 5, 2003 h. w. neff said:

 hi.
 in an attempt to shift  merge returned results, i find i
cannot use '' or '' in a p::rd executable block; i
must instead divide and multiply, respectively.
 e.g. in this code, 'goodrule' works, 'badrule' does not: 
 
   sub_rule :   /[0-7]/
   goodrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';'
  {
 $main::machineword = $item[3] * 8 | $item[7];
 1
  }
   badrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';'
  {
 $main::machineword = $item[3]  3 | $item[7];
 1
  }
 
 by 'not working' i mean i get undefined sub-rules and
such as if i was getting file/stream type redirection
instead of the shifts i want.

I suspect you have not correctly analyzed the problem.  Here is my test case:

#!/usr/intel/pkgs/perl/5.005_03/bin/perl
BEGIN {$::RD_HINT=1;
   $::RD_TRACE=100;}
use lib my version of 1.80 of P::RD;
use Parse::RecDescent;
my $grammar = q {
  sub_rule :   /[0-7]/
  goodrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';'
 {
$main::machineword = $item[3] * 8 | $item[7];
 print item 3 $item[3] $item[7] $main::machineword\n;
1
 }
  badrule : /reg1/i '=' sub_rule ',' /reg2/i '=' sub_rule ';'
 {
$main::machineword = $item[3]  3 | $item[7];
 print item 3 $item[3] $item[7] $main::machineword\n;
1
 }
};
my $parserRef = new Parse::RecDescent($grammar); 
print Good rule returns:,$parserRef-goodrule(reg1 = 1 , reg2 = 2;), 
'$main::machineword' \n;
print Bad rule returns: ,$parserRef-badrule(reg1 = 1 , reg2 = 2;), 
'$main::machineword'\n;
exit;


I get the following:

unix test.pl
Parse::RecDescent: Treating sub_rule : as a rule declaration
Parse::RecDescent: Treating /[0-7]/ as a /../ pattern terminal
Parse::RecDescent: Treating goodrule : as a rule declaration
Parse::RecDescent: Treating /reg1/i as a /../ pattern terminal
Parse::RecDescent: Treating = as a literal terminal
Parse::RecDescent: Treating sub_rule as a subrule match
Parse::RecDescent: Treating , as a literal terminal
Parse::RecDescent: Treating /reg2/i as a /../ pattern terminal
Parse::RecDescent: Treating = as a literal terminal
Parse::RecDescent: Treating sub_rule as a subrule match
Parse::RecDescent: Treating ; as a literal terminal
Parse::RecDescent: Treating { $main::machineword = $item[3] * 8 |
   $item[7]; print item 3 $item[3] $item[7]
   $main::machineword\n; 1 } as an action
Parse::RecDescent: Treating badrule : as a rule declaration
Parse::RecDescent: Treating /reg1/i as a /../ pattern terminal
Parse::RecDescent: Treating = as a literal terminal
Parse::RecDescent: Treating sub_rule as a subrule match
Parse::RecDescent: Treating , as a literal terminal
Parse::RecDescent: Treating /reg2/i as a /../ pattern terminal
Parse::RecDescent: Treating = as a literal terminal
Parse::RecDescent: Treating sub_rule as a subrule match
Parse::RecDescent: Treating ; as a literal terminal
Parse::RecDescent: Treating { $main::machineword = $item[3]  3 |
   $item[7]; print item 3 $item[3] $item[7]
   $main::machineword\n; 1 } as an action
printing code (21354) to RD_TRACE
|  goodrule  |Trying rule: [goodrule]   |
|  goodrule  |  |reg1 = 1 , reg2 = 
2;
|  goodrule  |Trying production: [/reg1/i '=' sub_rule  |
||',' /reg2/i '=' sub_rule ';'] |
|  goodrule  |Trying terminal: [/reg1/i]|
|  goodrule  |Matched terminal (return value:   |
||[reg1])   |
|  goodrule  |  | = 1 , reg2 = 2;
|  goodrule  |Trying terminal: ['=']|
|  goodrule  |Matched terminal (return value: [=])  |
|  goodrule  |  | 1 , reg2 = 2;
|  goodrule  |Trying subrule: [sub_rule]|
|  sub_rule  |Trying rule: [sub_rule]   |
|  sub_rule  |Trying production: [/[0-7]/]  |
|  sub_rule  |Trying terminal: [/[0-7]/]|
|  sub_rule  |Matched terminal (return value: [1])  |
|  sub_rule  |  | , reg2 = 2;
|  sub_rule  |Matched production: [/[0-7]/] |
|  sub_rule  |Matched rule (return value: [1])  |
|  sub_rule  |(consumed: [ 1])  |
|  goodrule  |Matched subrule: [sub_rule] (return   |
||value: [1]|
|  goodrule  |Trying terminal: [',']|
|  goodrule  |Matched terminal (return value: [,])  |
|  goodrule  |  | reg2 = 2;
|  goodrule  |Trying terminal: [/reg2/i]|
|  goodrule  |Matched terminal (return value:   |
||[reg2])   |