Re: locale problem with P::RD
Karl Gaissmaier schrieb: Hi, I nearly finished Config::Scoped, yet another config file parser but I am currently busy with an annoying locale problem. I'm for example not able to match german umlaute with the rule pattern /\w/ even with the proper LC_... env and 'use locale' in P::RD. With a plain pattern match 'string' =~ /\w+/ it's working! Please check my stripped down code snippet: # locale_test.pl use locale; use Parse::RecDescent; ... the problem is the lexical scope of use locale Best Regards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network
Re: error messages
Jonas Wolf schrieb: I tried redirecting STDERR to a variable or a file, but this does not take effect inside P::RD because of the way the STDERR is handled. I'd prefer not to meddle with P::RD, but if that's the only solution then I will. Here is some sample code which illustrates my point. The This is an error message\n gets read into the variable as expected, but P::RD's error messages are printed to STDERR nonetheless. you can cheat P::RD with directly accessing $thisparser-{errors}, see the FAQ: 'Accessing error data' and the answer by Damian Best Redards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network
Re: keyword value(s) newline
Ron D. Smith schrieb: ... Um the technical term for this is hell if I know. If you are irritated by it now, imagine how irritating it gets when the file you are parsing is HUGE and you get the whole thing for each an every attempt... (I modified the PR::D source to truncate the output because of this.) It takes some time to not necessary, just use $::RD_TRACE = 120; # since P::RD version 1.20 Defining $::RD_TRACE causes the parser generator and the parser to report their progress to STDERR in excruciating detail (although, without hints unless $::RD_HINT is separately defined). This detail can be moderated in only one respect: if $::RD_TRACE has an integer value (N) greater than 1, only the N characters of the current parsing context (that is, where in the input string we are at any point in the parse) is reported at any time. Best Regards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network
Re: Negative Look-ahead problem
Hi Andras Karl Gaissmaier wrote: hmmm, I've overseen that the global skip pattern is qr/s*/, therefore ANDY is matched as a reserved word too, since the token prefix can be just nothing. To change the token prefix in the RESERVED rule to s+ (-- see the + instead of *) should help: Thank you very much for your help! The fix you suggested did solve my problem. (I am still mystified about how demo_Cgrammar supposed to work since it does not make use of the skip directive.) I think there isn't any mystique, the demo-Cgrammar is just a demo and will also fail this test ;-) Best Regards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network Tel.: ++49 731 50-22499
Re: Perhaps a FAQ: How to shortcut an alternation if one already commited subrule fails
Hi Sean and other P::RD participants, Sean O'Rourke schrieb: Barring Perl 6, one thing you can do is lift the unique prefixes up into the rule you want to fail, e.g.: ENTITY : ( '#' commit COMMENT | IDENT '=' commit OPTION | TYPE NAME '{' commit DECLARATION | 'scope' commit SCOPE )(s) COMMENT : m/.*/ OPTION : VALUE { store($item{IDENT}, $item{VALUE}) } DECL: DECL_BODY '}' { store($item{TYPE}, $ITEM{NAME}, $ITEM{DECL_BODY}) } SCOPE : '{' ENTITY(s) '}' On the other hand, this _does_ highlight exactly what prefixes you're using to distinguish between rules. And it suggests you might want to put SCOPE right after comment, since it is accepted or rejected based on a single, simple token. sure this is one possibility with a lot of drawbacks. In my example I need the prefixes for processing it in the subrule actions and the parent rule consumed it already. OK, I could send it downstream as arguments but this makes the grammar very uply and hard to maintain. The biggest drawback is the error message: With commiting the different productions already in the ENTITY rule, I get just an error message, stating that a ENTITY failed, not if it was already a commited OPTION or DECLARATION etc. I was thinking about unique prefix lookahead, but this failed due to the multi token prefixes like TYPE VALUE in the declaration. OK, again you could collapse this to one more indirection, but this makes the grammar again more unreadable and hard to mainatin (the example is just a shrink down extract in a huge grammar). Hmm, any solutions with this - now more detailed - problem in mind? Best Regards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network Tel.: ++49 731 50-22499
Annotation/correction? to a FAQ topic
Hi P::RD lovers and FAQ maintainer(s), I stumbled over a piece of code in the FAQ belonging to this topic: Commit in subrule which is optional in rule The question was, how to fail a parent rule when an optional subrule has already commited but fails after commitment. Damian suggests a negative look-ahead following the optional subrule production, which works if you have an { action } and not just the default return result. Later on in this chapter there is an optimization example by Marcel Grunaer which isn't working correctly IMHO: Marcel went on to point out an optimization: another option would be the use of a rulevar: myrule : rulevar: local $failed myrule : 'stuff' mysubrule(?) reject:$failed mysubrule: ID commit '[' ']' | error? { $failed++ } the rule 'myrule' should fail if the subrule 'mysubrule' has already commited. This will not happen, let me explain: Case 1: the subrule 'mysubrule' fails before commit the production '| error? { $failed++ }' returns 0 (not undef!) for the following reasons: error? returns 0, since it wasn't commited (see below) { $failed++ } returns 0, since it's a postincrement of an formerly undefined value. The subrule matches, but the parent rule isn't successful the reject: $failed matches. This is not the intended behavior, the myrule should match as the subrule didn't commit Case 2: the subrule 'mysubrule' fails after commit the production '| error? { $failed++ }' comes to the directive error?, this directive matches and returns undef as a error.. directive should do. This means, you will never come to the { $failed++ } action. The subrule fails, the rule is successful since we have the optional (?) and $failed is still not set. The usual ' | error? reject ' pattern will have misled Marcel and all other FAQ readers until now, because this pretends that after a successful error? directive the subrule is continued. But this isn't correct, the reject directive is needed for uncommited errors. Hmmm, you will ask, why that, we just hit this production only if we are commited since the first directive is error?. No, when an error OR error? is the first directive in a production, an implicit uncommit is fired. Sure, it's difficult but useful and don't forget it's from Damian ;-) Just in case, see my attached code, this is the last source of truth if Damian has no time to follow this mailinglist: snip #!/usr/local/bin/perl use strict; use warnings; use Parse::RecDescent; $::RD_TRACE = 1; use Data::Dumper; my $grammar = 'EOG'; myrule: rulevar: local $failed myrule: mysubrule(?) reject:$failed { $return = 'success!' } mysubrule : 'ID' commit '[' ']' | error? { $failed++ } EOG my $parser = Parse::RecDescent-new($grammar) or die can't create parser,; my $text = join '', ; print Dumper($parser-myrule($text)); snip Best Regards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network Tel.: ++49 731 50-22499
Perhaps a FAQ: How to shortcut an alternation if one already commited subrule fails
Hi P::RD lovers, is there a general advice to shortcut an alternation in a rule if one already commited subrule fails: Example: ENTITY : ( COMMENT | OPTION | DECLARATION | SCOPE )(s) COMMENT : '#' commit m/.*/ | error? reject OPTION : IDENT '=' commit VALUE { store($item{IDENT}, $item{VALUE}) } | error? reject DECL: TYPE NAME '{' commit DECL_BODY '}' { store($item{TYPE}, $ITEM{NAME}, $ITEM{DECL_BODY}) } | error? reject SCOPE : 'scope' commit '{' ENTITY(s) '}' | error? reject I would like to see the immediately failure of ENTITY, if e.g. the DECL subrule fails after already commited. Is there a general usage pattern to shortcut the ENTITY rule if an already commited subrule fails? Best Regards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network Tel.: ++49 731 50-22499
Re: Negative Look-ahead problem
Andras Karacsony schrieb: Hello Karl! No, because the C grammar example doesn't tweak the skip pattern either. (I made my program work by 'preprocessing' the target text. $_ = lc ; s/\b(and|or)\b/\U$1/g; my $tree = $parser-start($_); and getting rid of the 'i' reserved_word: 'AND' | 'OR' WORD: ...!reserved_word /[a-z0-9]+/ ) this is really not necessary, I think you should shrink it down to a working example like: snip #!/usr/local/bin/perl use strict; use warnings; use Parse::RecDescent; $::RD_TRACE = 1; use Data::Dumper; my $grammar = 'EOG'; autotree RESERVED : 'AND' | 'OR' IDENTIFIER : ...!RESERVED m/[a-z0-9]+/i EOG my $parser = Parse::RecDescent-new($grammar) or die can't create parser,; my $text = join '', ; print Dumper($parser-IDENTIFIER($text)); snip and this works very well. I think there is an other problem hidden in your code. Best Regards Charly Andras -Original Message- From: Karl Gaissmaier [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 31, 2004 4:24 AM To: PerlDiscuss - Perl Newsgroups and mailing lists Subject: Re: Negative Look-ahead problem PerlDiscuss - Perl Newsgroups and mailing lists schrieb: reserved_word: 'AND' | 'OR' WORD: ...!reserved_word /[a-z0-9]+/i Using the above grammar, rule WORD fails to match any token that start with a reserved word (e.g. Andy). That is not what I expected since in 'demo_Cgrammar.pl', we have: IDENTIFIER: ...!reserved_word /[a-z]\w*/i I am I missing something? Thanks! Andras hmm, did to tweak the skip pattern? Regards Charly -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network Tel.: ++49 731 50-22499 -- Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany Email:[EMAIL PROTECTED] Service Group Network Tel.: ++49 731 50-22499
Re: skip: over comments and misleading error messages
Hi Yves, (and perhaps Damian, if he has time to read until the end, sorry) Orton, Yves schrieb: oh fine, the next time I will look first in the source! No. Read the docs first. Its there... :-) Oh god, I red it, at least two times and also the FAQ. The next time I use grep, sigh. and later on in the doc (shame on my head, but anyway it doesn't solve the problem with misleading error messages) Terminal Separators For the purpose of matching, each terminal in a production is considered to be preceded by a ``prefix'' - a pattern which must be matched before a token match is attempted. By default, the prefix is optional whitespace (which always matches, at least trivially), but this default may be reset in any production. The variable $Parse::RecDescent::skip stores the universal prefix, which is the default for all terminal matches in all parsers built with Parse::RecDescent. The prefix for an individual production can be altered by using the skip:... directive (see below). but, and that was my problem, under the explanation of the skip directive no longer mentioned. but not proper matching, it's already stopping at the first line comment and therefore you get this ERROR messages as you get. Er, I dont understand you. That pattern will skip all line comments and whitespace. (Well, actually P::RD will match that regex repeated times as is necessary.) no, look at this code (your regex) and trace and output: #!/usr/bin/perl -w use Parse::RecDescent; $Parse::RecDescent::skip = qr{(^\s+|#.*$)+}; $RD_TRACE = 1; my $grammar ='EOGRAMMAR'; file: int(s) /\z/ | error int : /[+-]?\d+/ | error EOGRAMMAR my $parser = Parse::RecDescent-new($grammar); my $text = 'EOTEXT'; # comment .123 EOTEXT my $result = $parser-file($text); Parse::RecDescent: Treating file : as a rule declaration Parse::RecDescent: Treating int(s) as a one-or-more subrule match Parse::RecDescent: Treating /\z/ as a /../ pattern terminal Parse::RecDescent: Treating | error as a new (error) production Parse::RecDescent: Treating error as an error marker Parse::RecDescent: Treating int : as a rule declaration Parse::RecDescent: Treating /[+-]?\d+/ as a /../ pattern terminal Parse::RecDescent: Treating | error as a new (error) production Parse::RecDescent: Treating error as an error marker printing code (10158) to RD_TRACE | file |Trying rule: [file] | | file | | # comment\n .123\n | file |Trying production: [int /\z/] | | file |Trying repeated subrule: [int]| | int|Trying rule: [int]| | int|Trying production: [/[+-]?\d+/] | | int|Trying terminal: [/[+-]?\d+/] | | int|Didn't match terminal | | int| |# comment\n .123\n | int|Trying production: [error...] | | int| | # comment\n .123\n | int|Trying directive: [error...]| | int| |# comment\n .123\n | int|Didn't match directive| | int|Didn't match rule | | file |Didn't match repeated subrule: | | |[int] | | file | | # comment\n .123\n | file |Trying production: [error...] | | file |Trying directive: [error...]| | file |Didn't match directive| | file |Didn't match rule | ERROR (line 1): Invalid int: Was expecting /[+-]?\\d+/ ERROR (line 1): Invalid file: Was expecting int as you can see, the parser is never skipping over the first comment due to the skip regex, and therefore the error message is correct (all is happenend in line 1) and with my flattened regex (thanks to your hints!) #!/usr/bin/perl -w use Parse::RecDescent; $Parse::RecDescent::skip = qr{(\s+|#.*)+}; $RD_TRACE = 1; my $grammar ='EOGRAMMAR'; file: int(s) /\z/ | error int : /[+-]?\d+/ | error EOGRAMMAR my $parser = Parse::RecDescent-new($grammar); my $text = 'EOTEXT'; # comment .123 EOTEXT my $result = $parser-file($text); Parse::RecDescent: Treating file : as a rule declaration Parse::RecDescent: Treating int(s) as a one-or-more subrule match Parse::RecDescent: Treating /\z/ as a /../ pattern terminal Parse::RecDescent: Treating | error as a new (error) production Parse::RecDescent: Treating error as an error marker Parse::RecDescent: Treating int : as a rule declaration Parse::RecDescent: Treating /[+-]?\d+/ as a /../ pattern terminal Parse::RecDescent: Treating |
Re: Feature wishlist for P::RD 2.0 ore perhaps 1.81
Hi Jonathan, Jonathan Mayer schrieb: Apologies for butting in where my opinion is not asked for, but ... you're welcome, it's a mailing list, isn't it. statement: A! | B! | C | D is so easy to understand: At least A and B, optionally C and/or D but without ORDER, in comparison to action codes and greps and maps and line noise. At what point does A! become mandatory? In the block: statement(s) /* A or B must be part of block */ construct? Or in the program: block(s)/* A or B must be part of program */ block: statement(s) construct? There are times whem both forms are useful -- but defining mandatory as part of the syntax for the singular statement construct is limitting. hmmm, what is wrong when it is necessary for the block statement, because then it is automatically true for program Also, what if the programmer wants a more complicated logical function on the set of statements that comprises a minimal block? It seems to me, P:RD already has the functionality you desire, in a much more flexible form. What's wrong with: statement: A | B | C | D block: { statements(s) } { /* some code to test for the presense of A and B, else return undef */ } syntax is scattered between grammar rules and action code and the presense check isn't always so intuitive as in this primitive example. A and B are also complex subrules and it is really only a question of style. I'd hate to see P:RD fall into same trap as regexps: P:RD doesn't need to be a complete programming language. P:RD is fine as a perl accessory. yes it's really fine, even it it stays as it already is. Regards Charly -- Karl Gaissmaier Computing Center,University of Ulm,Germany Email:[EMAIL PROTECTED] Network Administration Tel.: ++49 731 50-22499