converting a BNF to PRD

Jim Cromie Wed, 14 Jan 2004 01:29:20 -0800

Hi folks,

Ive been toying recently with PRD, Im new at it, and would like a reality check Im trying to use it to convert one BNF flavor into a form that can be fed into PRD, with the resulting parser being used to parse SQL

This 2 stage approach seems appropriate cuz; the bnf file is ~3K, and I dont want to convert it manually, and potentially have to maintain it. It also seems like a relatively small conversion compared to the 2nd stage - I hope to gain enough competence in 1st stage that the 2nd begins to look more practical. Finally, Id like to think that this BNF can then be supplemented with vendor specific subclasses,

I also wonder if this might be generally useful/instructive enough to go in the distributions demo/ subdir

what Ive written so far handles these grammar items well enough to start with, but Ive yet to devise the set of actions that will produce the PRD version. That will probably yield to some effort -

<simple Latin letter> ::=
     <simple Latin upper case letter>
   | <simple Latin lower case letter>

<simple Latin upper case letter> ::=
         A | B | C | D | E | F | G | H | I | J | K | L | M | N | O
   | P | Q | R | S | T | U | V | W | X | Y | Z

<simple Latin lower case letter> ::=
         a | b | c | d | e | f | g | h | i | j | k | l | m | n | o
   | p | q | r | s | t | u | v | w | x | y | z

Im not clear on the meaning and/or handling of these constructs in the BNF

   looks like a required alternation.  ... repeated one or more times.
   <separator> ::= { <comment> | <space> | <newline> }...

I think this translates to following, but Im slightly concerned that
the form of the rules has changed, and the conversion is not so simple.
If theres a way to keep a 1/1 correspondence of tthe superficial structure,
then it would keep the 1st stage simple.

   separators : separator(s)
   separator : comment | space | newline

this looks like optional, repeating,

<simple comment> ::=
   <simple comment introducer> [ <comment character>... ] <newline>

translating to

simple_comment : simple_comment_introducer comment_char(s)? newline

I dont know what to make of this

<nonquote character> ::= !! <EMPHASIS>(See the Syntax Rules.)

This looks like a syntax error - are there other interpreteations ? <general set function> ::= <set function type> <left paren> [ <set quantifier> ] <value expression> <right paren> ]

#!/usr/local/bin/perl -w


# 1st attempt at a PRD based SQL92::Parser
# to eventually parse these:
# http://www.contrib.andrew.cmu.edu/~shadow/sql/sql2bnf.aug92.txt
# http://www.contrib.andrew.cmu.edu/~shadow/sql/sql3bnf.sep93.txt

my $dd;
use Data::Dumper::EasyOO (init => \$dd, autoprint => 1, indent => 1);

use Parse::RecDescent;
use Getopt::Std;

$bnffile = 'sql2bnf.aug92.txt';
$bnffile = 'sql3bnf.sep93.txt';

my $sqlgrammar;
my $samplegrammar = q{

This is a cut-paste from one of above files.
It illustrates the bnf syntax used there, which must be adapted
before feeding to P::RD

<SQL language character> ::=
      <simple Latin letter>
    | <digit>
    | <SQL special character>

<simple Latin letter> ::=
      <simple Latin upper case letter>
    | <simple Latin lower case letter>

<simple Latin upper case letter> ::=
      A | B | C | D | E | F | G | H | I | J | K | L | M | N | O
    | P | Q | R | S | T | U | V | W | X | Y | Z

<simple Latin lower case letter> ::=
      a | b | c | d | e | f | g | h | i | j | k | l | m | n | o
    | p | q | r | s | t | u | v | w | x | y | z

<digit> ::=
    0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

};

getopts('bvthaTm', my $opt={}) or die
    qq{
        b : load sql-bnf
        v : verbose - echos the grammar file
        t : turn on trace
        h : turn on hints
        a : autoaction
        T : autoTree   
        m : print dump of metaparser
    };


if ($opt->{b}) {
    local $/ = undef;
    open ($fh, $bnffile) or die "cant open file: $bnffile: $!\n";
    $sqlgrammar = <$fh>;
} else {
    $sqlgrammar = $samplegrammar;
}

# alter the grammar syntax used in bnf-file.  Do this at load time,
# rather than hacking the file by hand, and maintaining it forever..

# 1st step is to comment-ize the leading text.  Doing this might be
# simple in PRD (tips welcome), but plain perl works, and needs no
# magic/wizardry.

my $prologue;
if ($sqlgrammar =~ s/^(.+?\n\s)</</sm) {
    # fix up leading comments
    $prologue = join("\n# ", '',split(/\n/, $1),"\n");
    #print "INTRO: $prologue\n";
}

print "GRAMMAR:\n $prologue $sqlgrammar\nEO_GRAMMAR\n" if $opt->{v};

# now try use PRD to morph the grammar syntax used in the bnf file,
# rather than hack at it mercilessly with regexs.

$::RD_TRACE = 1   if $opt->{t};
$::RD_HINT = 1    if $opt->{h};

$::RD_AUTOACTION = q/{ bless \%item, $item[0] }/   if $opt->{a};

my $metagrammar = q{
    # metagrammar, shamelessly adapted from PRD pod

        grammar    : ruledefs(s)

        ruledefs :
            identifier  isdefined  production
            #{ "$item{identifier}  $item{isdefined}  $item{production}" }

        isdefined  :
            /::=\s*/    { ':' }

        identifier :
            /<(\w+(\s+\w+)*)>/i                 # <this is the form>
            { $_ = $1; s/\s+/_/g; $_ }          # strip <>, chg \s+ to '_'

        production :
            subrule(s /\|/)
            | altitems
            | item(s)

        altitems :
            item(s /\|/)

        item :
            subrule #args(?)            # match another rule
            | altterms
            | bareword                  # match the next input

        subrule :
            identifier(s /\|/)  # the name of the rule

        #term       : /(.*)\n\n/        { "qr/$1/x" }

        altterms :
            bareword(s /\|/)
            { ::crunchregex($item{'bareword(s)'}) }
 
        bareword   : /(\w+)/ { "$1" }
};

sub crunchregex {
    my ($stuff) = shift;
    $dd->(crunching => $stuff);
    my $res;
    if (not grep {length($_) != 1} @$stuff) {
        # all items are single char,
        $res = "/[" . join ('', @$stuff) . "]/";
    }
    else { $res = $stuff };

    $dd->(crunched =>$res);
    return $res;
}

$metagrammar = "<autotree>" . $metagrammar if $opt->{T};

$metaparser = new Parse::RecDescent( $metagrammar );
$dd->(metaparser => $metaparser) if $opt->{m};

#$a = $metaparser->altitems($sqlgrammar);
$a = $metaparser->grammar($sqlgrammar);
$dd->(metagrammar => $a);

#$parser = new Parse::RecDescent ($metagrammar);

##############################

=head1

Here a sample of the grammar at those URLs at the top.
The differences I see so far are as follows:

  * rule-name-terminator is '::=', not ':'
    I think Ive fixed this in the meta-grammar

  * rule-names are <bracketed> but have spaces.
    Ive altered the identifier rule, bu Im considering using
    Regexp::Common: /$RE{balanced}{-parens=>'<>'}/ as a plan-B

  * literals are not quoted.
    this simplest answer here is probably implicit qr//x,
    Ive defined terminal rule to do this.


=head1 Bugs

heh. before you have bugs, youve got to have 'something' that works,
even a little bit.  That said however, this appears to be a problem.



        grammar    : rule(s)
        rule       : identifier /::=\s*/ production
        { "$item{identifier} = $item{production}" } 

        identifier : /<([A-z]\w*(\s+\w+)*)>/i           # <this is the form>
        { $_ = $1; s/\s+/_/g; $_ }              # strip <>, chg \s+ to '_'

        production : altitems
        { $item{altitems} }

        altitems   : bareword(s /\|/)
        { ::crunchregex($item{'bareword(s)'}) }
 
        #item      : bareword                          # match the next input

        bareword   : /(\w+)/ { "$1" }

converting a BNF to PRD

Reply via email to