Re: locale problem with P::RD

2004-07-16 Thread Karl Gaissmaier
Karl Gaissmaier schrieb:
Hi,
I nearly finished Config::Scoped, yet another config
file parser but I am currently busy with an annoying
locale problem. I'm for example not able to match
german umlaute with the rule pattern /\w/ even with
the proper LC_... env and 'use locale' in P::RD.
With a plain pattern match 'string' =~ /\w+/ it's working!
Please check my stripped down code snippet:
# locale_test.pl 
use locale;
use Parse::RecDescent;
...
the problem is the lexical scope of use locale
Best Regards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network


Re: error messages

2004-07-16 Thread Karl Gaissmaier
Jonas Wolf schrieb:
I tried redirecting STDERR to a variable or a file, but this does not take 
effect inside P::RD because of the way the STDERR is handled. I'd prefer 
not to meddle with P::RD, but if that's the only solution then I will.
Here is some sample code which illustrates my point. The This is an error 
message\n gets read into the variable as expected, but P::RD's error 
messages are printed to STDERR nonetheless.
you can cheat P::RD with directly accessing $thisparser-{errors},
see the FAQ: 'Accessing error data' and the answer by Damian
Best Redards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network


Re: keyword value(s) newline

2004-05-14 Thread Karl Gaissmaier
Ron D. Smith schrieb:

...
Um the technical term for this is hell if I know.  If you are irritated by 
it now, imagine how irritating it gets when the file you are parsing is HUGE 
and you get the whole thing for each an every attempt...  (I modified the 
PR::D source to truncate the output because of this.) It takes some time to 
not necessary, just use $::RD_TRACE = 120; # since P::RD version 1.20

Defining $::RD_TRACE causes the parser generator and the
parser to report their progress to STDERR in
excruciating detail (although, without hints unless
$::RD_HINT is separately defined). This detail can be
moderated in only one respect: if $::RD_TRACE has an
integer value (N) greater than 1, only the N characters
of the current parsing context (that is, where in the
input string we are at any point in the parse) is
reported at any time.
Best Regards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network


Re: Negative Look-ahead problem

2004-04-05 Thread Karl Gaissmaier
Hi Andras

Karl Gaissmaier wrote:

hmmm, I've overseen that the global skip pattern
is qr/s*/, therefore ANDY is matched as a reserved
word too, since the token prefix can be just nothing.


To change the token prefix in the RESERVED rule
to s+ (-- see the + instead of *) should help:


Thank you very much for your help!
The fix you suggested did solve my problem.
(I am still mystified about how demo_Cgrammar supposed to work since it
does not make use of the skip directive.)
I think there isn't any mystique, the demo-Cgrammar is
just a demo and will also fail this test ;-)
Best Regards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network
Tel.: ++49 731 50-22499


Re: Perhaps a FAQ: How to shortcut an alternation if one already commited subrule fails

2004-04-01 Thread Karl Gaissmaier
Hi Sean and other P::RD participants,

Sean O'Rourke schrieb:

Barring Perl 6, one thing you can do is lift the unique prefixes up
into the rule you want to fail, e.g.:
ENTITY : ( '#' commit COMMENT
 | IDENT '=' commit OPTION
 | TYPE NAME '{' commit DECLARATION
 | 'scope' commit SCOPE
 )(s)
COMMENT : m/.*/

OPTION  : VALUE
   { store($item{IDENT}, $item{VALUE}) }
DECL: DECL_BODY '}'
   { store($item{TYPE}, $ITEM{NAME}, $ITEM{DECL_BODY}) }
SCOPE   : '{' ENTITY(s) '}'

On the other hand, this _does_ highlight exactly what prefixes you're
using to distinguish between rules.  And it suggests you might want
to put SCOPE right after comment, since it is accepted or rejected
based on a single, simple token.


sure this is one possibility with a lot of drawbacks.

In my example I need the prefixes for processing it in
the subrule actions and the parent rule consumed it
already. OK, I could send it downstream as arguments
but this makes the grammar very uply and hard to maintain.
The biggest drawback is the error message:
With commiting the different productions already
in the ENTITY rule, I get just an error message, stating
that a ENTITY failed, not if it was already a commited
OPTION or DECLARATION etc.
I was thinking about unique prefix lookahead, but this
failed due to the multi token prefixes like TYPE VALUE
in the declaration. OK, again you could collapse this
to one more indirection, but this makes the grammar again
more unreadable and hard to mainatin (the example is just
a shrink down extract in a huge grammar).
Hmm, any solutions with this - now more detailed - problem
in mind?
Best Regards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network
Tel.: ++49 731 50-22499


Annotation/correction? to a FAQ topic

2004-04-01 Thread Karl Gaissmaier
Hi P::RD lovers and FAQ maintainer(s),

I stumbled over a piece of code in the FAQ
belonging to this topic:
Commit in subrule which is optional in rule

The question was, how to fail a parent rule when an optional
subrule has already commited but fails after commitment.
Damian suggests a negative look-ahead following the optional
subrule production, which works if you
have an { action } and not just the default return result.
Later on in this chapter there is an optimization example
by Marcel Grunaer which isn't working correctly IMHO:
Marcel went on to point out an optimization:

another option would be the use of a rulevar:

  myrule : rulevar: local $failed
  myrule : 'stuff' mysubrule(?) reject:$failed
  mysubrule: ID commit '[' ']'
| error? { $failed++ }
the rule 'myrule' should fail if the subrule 'mysubrule'
has already commited. This will not happen, let me explain:
Case 1: the subrule 'mysubrule' fails before commit

the production '| error? { $failed++ }' returns
0 (not undef!) for the following reasons:
error? returns 0, since it wasn't commited (see below)
{ $failed++ } returns 0, since it's a postincrement
of an formerly undefined value.
The subrule matches, but the parent rule isn't successful
the reject: $failed matches.
This is not the intended behavior, the myrule should
match as the subrule didn't commit
Case 2: the subrule 'mysubrule' fails after commit

the production '| error? { $failed++ }' comes to
the directive error?, this directive matches and returns
undef as a error.. directive should do.
This means, you will never come to the { $failed++ } action.
The subrule fails, the rule is successful since we have
the optional (?) and $failed is still not set.
The usual ' | error? reject ' pattern will have misled
Marcel and all other FAQ readers until now, because this
pretends that after a successful error? directive the
subrule is continued. But this isn't correct, the reject
directive is needed for uncommited errors.
Hmmm, you will ask, why that, we just hit this production
only if we are commited since the first directive is error?.
No, when an error OR error? is the first directive in a
production, an implicit uncommit is fired.
Sure, it's difficult but useful and don't forget it's from Damian ;-)

Just in case, see my attached code, this is the last source
of truth if Damian has no time to follow this mailinglist:
 snip 
#!/usr/local/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
$::RD_TRACE = 1;
use Data::Dumper;
my $grammar = 'EOG';
myrule: rulevar: local $failed
myrule: mysubrule(?) reject:$failed { $return = 'success!' }
mysubrule : 'ID' commit '[' ']'
| error? { $failed++ }
EOG
my $parser = Parse::RecDescent-new($grammar)
or die can't create parser,;
my $text = join '', ;
print Dumper($parser-myrule($text));
 snip 
Best Regards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network
Tel.: ++49 731 50-22499


Perhaps a FAQ: How to shortcut an alternation if one already commited subrule fails

2004-03-31 Thread Karl Gaissmaier
Hi P::RD lovers,

is there a general advice to shortcut an alternation in a rule
if one already commited subrule fails:
Example:

ENTITY : ( COMMENT | OPTION | DECLARATION | SCOPE )(s)

COMMENT : '#' commit m/.*/
  | error? reject
OPTION  : IDENT '=' commit VALUE
  { store($item{IDENT}, $item{VALUE}) }
  | error? reject
DECL: TYPE NAME '{' commit DECL_BODY '}'
  { store($item{TYPE}, $ITEM{NAME}, $ITEM{DECL_BODY}) }
  | error? reject
SCOPE   : 'scope' commit '{' ENTITY(s) '}'
  | error? reject
I would like to see the immediately failure of ENTITY, if
e.g. the DECL subrule fails after already commited.
Is there a general usage pattern to shortcut the ENTITY rule
if an already commited subrule fails?
Best Regards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network
Tel.: ++49 731 50-22499


Re: Negative Look-ahead problem

2004-03-31 Thread Karl Gaissmaier
Andras Karacsony schrieb:

Hello Karl!

No, because the C grammar example doesn't tweak the skip pattern either.

(I made my program work by 'preprocessing' the target text.

$_ = lc ;
s/\b(and|or)\b/\U$1/g; 

my $tree = $parser-start($_);

and getting rid of the 'i'

reserved_word: 'AND' | 'OR' 
WORD:  ...!reserved_word /[a-z0-9]+/
)

this is really not necessary, I think you should shrink
it down to a working example like:
 snip 
#!/usr/local/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
$::RD_TRACE = 1;
use Data::Dumper;
my $grammar = 'EOG';
autotree
RESERVED : 'AND' | 'OR'
IDENTIFIER : ...!RESERVED m/[a-z0-9]+/i
EOG
my $parser = Parse::RecDescent-new($grammar)
or die can't create parser,;
my $text = join '', ;
print Dumper($parser-IDENTIFIER($text));
 snip 
and this works very well. I think there is an other
problem hidden in your code.
Best Regards
Charly
Andras


-Original Message-
From: Karl Gaissmaier [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 31, 2004 4:24 AM
To: PerlDiscuss - Perl Newsgroups and mailing lists
Subject: Re: Negative Look-ahead problem
PerlDiscuss - Perl Newsgroups and mailing lists schrieb:


reserved_word: 'AND' | 'OR' 
WORD:  ...!reserved_word /[a-z0-9]+/i 

Using the above grammar, rule WORD fails to match any token 
that start

with a reserved word (e.g. Andy). That is not what I 
expected since in

'demo_Cgrammar.pl', we have:

IDENTIFIER: ...!reserved_word /[a-z]\w*/i

I am I missing something?

Thanks!
Andras
hmm, did to tweak the skip pattern?
Regards
Charly
--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network
Tel.: ++49 731 50-22499





--
Karl Gaissmaier   KIZ/Infrastructure, University of Ulm, Germany
Email:[EMAIL PROTECTED]   Service Group Network
Tel.: ++49 731 50-22499


Re: skip: over comments and misleading error messages

2002-04-17 Thread Karl Gaissmaier

Hi Yves, (and perhaps Damian, if he has time to read
until the end, sorry)

 Orton, Yves schrieb:
 
  oh fine, the next time I will look first in the source!
 
 No. Read the docs first. Its there... :-)

Oh god, I red it, at least two times and also the FAQ. The next
time I use grep, sigh.

and later on in the doc (shame on my head, but anyway it
doesn't solve the problem with misleading error messages)

 Terminal Separators
 
 For the purpose of matching, each terminal in a production is
 considered to be preceded by a ``prefix'' - a pattern which
 must be matched before a token match is attempted. By default,
 the prefix is optional whitespace (which always matches, at
 least trivially), but this default may be reset in any production.
 
 The variable $Parse::RecDescent::skip stores the universal
 prefix, which is the default for all terminal matches in
 all parsers built with Parse::RecDescent.
 
 The prefix for an individual production can be altered by
 using the skip:... directive (see below).

but, and that was my problem, under the explanation of the
skip directive no longer mentioned.

  but not proper matching, it's already stopping at the first
  line comment
  and therefore you get this ERROR messages as you get.
 
 Er, I dont understand you. That pattern will skip all line comments
 and whitespace. (Well, actually P::RD will match that regex repeated
 times as is necessary.)

no, look at this code (your regex) and trace and output:

#!/usr/bin/perl -w
use Parse::RecDescent;
$Parse::RecDescent::skip = qr{(^\s+|#.*$)+};
$RD_TRACE = 1;
my $grammar ='EOGRAMMAR';
file: int(s) /\z/
| error
 
int : /[+-]?\d+/
| error
EOGRAMMAR

my $parser = Parse::RecDescent-new($grammar);

my $text = 'EOTEXT';
# comment
.123
EOTEXT

my $result = $parser-file($text);

Parse::RecDescent: Treating file : as a rule declaration
Parse::RecDescent: Treating int(s) as a one-or-more subrule match
Parse::RecDescent: Treating /\z/ as a /../ pattern terminal
Parse::RecDescent: Treating | error as a new (error) production
Parse::RecDescent: Treating error as an error marker
Parse::RecDescent: Treating int : as a rule declaration
Parse::RecDescent: Treating /[+-]?\d+/ as a /../ pattern terminal
Parse::RecDescent: Treating | error as a new (error) production
Parse::RecDescent: Treating error as an error marker
printing code (10158) to RD_TRACE
|   file   |Trying rule: [file]   |
|   file   |  | # comment\n .123\n
|   file   |Trying production: [int /\z/] |
|   file   |Trying repeated subrule: [int]|
|   int|Trying rule: [int]|
|   int|Trying production: [/[+-]?\d+/]   |
|   int|Trying terminal: [/[+-]?\d+/] |
|   int|Didn't match terminal |
|   int|  |# comment\n .123\n
|   int|Trying production: [error...]   |
|   int|  | # comment\n .123\n
|   int|Trying directive: [error...]|
|   int|  |# comment\n .123\n
|   int|Didn't match directive|
|   int|Didn't match rule |
|   file   |Didn't match repeated subrule:  |
|  |[int]   |
|   file   |  | # comment\n .123\n
|   file   |Trying production: [error...]   |
|   file   |Trying directive: [error...]|
|   file   |Didn't match directive|
|   file   |Didn't match rule |

   ERROR (line 1): Invalid int: Was expecting /[+-]?\\d+/

   ERROR (line 1): Invalid file: Was expecting int

as you can see, the parser is never skipping over the first
comment due to the skip regex, and therefore the error message
is correct (all is happenend in line 1)

and with my flattened regex (thanks to your hints!)

#!/usr/bin/perl -w
use Parse::RecDescent;
$Parse::RecDescent::skip = qr{(\s+|#.*)+};
$RD_TRACE = 1;
my $grammar ='EOGRAMMAR';
file: int(s) /\z/
| error
 
int : /[+-]?\d+/
| error
EOGRAMMAR

my $parser = Parse::RecDescent-new($grammar);

my $text = 'EOTEXT';
# comment
.123
EOTEXT

my $result = $parser-file($text);

Parse::RecDescent: Treating file : as a rule declaration
Parse::RecDescent: Treating int(s) as a one-or-more subrule match
Parse::RecDescent: Treating /\z/ as a /../ pattern terminal
Parse::RecDescent: Treating | error as a new (error) production
Parse::RecDescent: Treating error as an error marker
Parse::RecDescent: Treating int : as a rule declaration
Parse::RecDescent: Treating /[+-]?\d+/ as a /../ pattern terminal
Parse::RecDescent: Treating | 

Re: Feature wishlist for P::RD 2.0 ore perhaps 1.81

2002-04-16 Thread Karl Gaissmaier

Hi Jonathan,

Jonathan Mayer schrieb:
 
 Apologies for butting in where my opinion is not asked for, but ...
 

you're welcome, it's a mailing list, isn't it.


  statement: A! | B! | C | D
 
  is so easy to understand: At least A and B, optionally C and/or D
  but without ORDER, in comparison to action codes and greps and maps
  and line noise.
 
 At what point does A! become mandatory?  In the
 block: statement(s)  /* A or B must be part of block */
 construct?  Or in the
 program: block(s)/* A or B must be part of program */
 block: statement(s)
 construct?  There are times whem both forms are useful -- but defining
 mandatory as part of the syntax for the singular statement
 construct is limitting.

hmmm, what is wrong when it is necessary for the block statement, because 
then it is automatically true for program

 
 Also, what if the programmer wants a more complicated logical function
 on the set of statements that comprises a minimal block?
 
 It seems to me, P:RD already has the functionality you desire, in a
 much more flexible form.  What's wrong with:
 
 statement: A | B | C | D
 block: { statements(s) }
 {
 /* some code to test for the presense of A and B,
else return undef */
 }

syntax is scattered between grammar rules and action code and the
presense check isn't always so intuitive as in this primitive example.
A and B are also complex subrules and it is really only a question
of style.

 
 I'd hate to see P:RD fall into same trap as regexps: P:RD doesn't need
 to be a complete programming language.  P:RD is fine as a perl
 accessory.

yes it's really fine, even it it stays as it already is.

Regards
Charly
-- 
Karl Gaissmaier  Computing Center,University of Ulm,Germany
Email:[EMAIL PROTECTED]  Network Administration
Tel.: ++49 731 50-22499