Ops Tree.pm

Steve Fink Wed, 24 Nov 2004 20:38:13 -0800

cvsuser     04/11/24 20:38:00


  Modified:    languages/regex README
               languages/regex/lib/Regex Driver.pm Grammar.pm Grammar.y
               languages/regex/lib/Regex/CodeGen IMCC.pm Perl5.pm
               languages/regex/lib/Regex/Ops Tree.pm
  Log:
  Embedded code! Good toy.
  
  Revision  Changes    Path
  1.15      +31 -29    parrot/languages/regex/README
  
  Index: README
  ===================================================================
  RCS file: /cvs/public/parrot/languages/regex/README,v
  retrieving revision 1.14
  retrieving revision 1.15
  diff -u -r1.14 -r1.15
  --- README    25 Nov 2004 03:51:25 -0000      1.14
  +++ README    25 Nov 2004 04:37:55 -0000      1.15
  @@ -119,6 +119,8 @@
    [a-z]   - character classes
    {n,m}   - greedily match n..m times
    {n,m}?  - nongreedily match n..m times
  + <rule>  - rule invocation (Perl6 feature)
  + (?{ })  - embedded code (just {...} in Perl6)
   
   Missing perl5 features:
    \w      - named character classes
  @@ -132,13 +134,11 @@
    (?!R)   - negative look-ahead assertion
    (?<=)   - look-behind assertion
    (?<!)   - negative look-behind assertion
  - (?{ })  - embedded code
    (??{ }) - match-time evaluated subexpression
    (?>R)   - independent subexpression
    (?(cond)R|S)
            - conditional expression
    (R?)*   - empty match suppression
  - <rule>  - rule invocation (Perl6 feature)
   
   Missing perl6 features:
   :
  @@ -252,42 +252,44 @@
   ================
   
   This rule compiler is not the officially blessed one. That would be
  -Patrick Michaud's p6ge, which you can find in compilers/p6ge. This
  -engine predates p6ge by a year or two, but never managed to generate
  -sufficient interest to get anyone else involved. Patrick must have
  -been aware of this engine, since I told him about it both personally
  -and in a message to perl6-internals, but he has never acknowledged its
  -existence nor explained why he felt the need to start from scratch. I
  -have to conclude that he either looked at it and didn't like the
  -design or the implementation; or he just wanted to start from scratch
  -so that he could fully understand the system he was working on. All of
  -which are perfectly good reasons, so I bear no ill will towards the
  -official effort.
  +Patrick Michaud's PGE (Parrot Grammar Engine) , which you can find in
  +compilers/pge. This engine (in languages/regex, referred to in the
  +following text as l/rx) predates pge by a year or two, but never
  +managed to generate sufficient interest to get anyone else involved.
  +Patrick must have been aware of this engine, since I told him about it
  +both personally and in a message to perl6-internals, but he has never
  +acknowledged its existence nor explained why he felt the need to start
  +from scratch. I have to conclude that he either looked at it and
  +didn't like the design or the implementation; or he just wanted to
  +start from scratch so that he could fully understand the system he was
  +working on. All of which are perfectly good reasons, so I bear no ill
  +will towards the official effort.
   
  -I am assuming that p6ge is going to get the momentum of the community
  +I am assuming that pge is going to get the momentum of the community
   behind it, so I would advise anyone interested in working on a rule
   engine to look there first. (Look for discussion on the
   perl6-internals and perl6-compiler mailing lists.) However, I still
   intend to work on this engine for a while longer, and welcome any
   interested participants. (Send any requests/comments/suggestions
   either to perl6-internals or directly to me at [EMAIL PROTECTED]) So
  -far, I have only briefly looked at p6ge, but I think this
  +far, I have only briefly looked at pge, but I think this
   languages/regex engine has enough of a different approach that it is
   still valuable for gathering lessons -- and may still make the most
   sense in the long run.
   
   That last statement demands a bit of explanation, so here's an excerpt
  -of a mail I sent out after my first look at p6ge (remember, p6ge has
  +of a mail I sent out after my first look at pge (remember, pge has
   probably advanced past this point by now):
   
  -It sounds like languages/regex handles pretty much exactly the same
  -things as p6ge, probably a few more and a few less. I haven't actually
  -looked at the code, but from the description I'd guess that the main
  -differences are:
  +It sounds like l/rx handles pretty much exactly the same things as
  +pge, probably a few more and a few less. I haven't actually looked at
  +the code, but from the description I'd guess that the main differences
  +are:
    
  -  - p6ge is implemented in C; l/rx in Perl5
  -  - p6ge generates PIR; l/rx has both PIR and Perl5 backends
  -  - p6ge uses coroutines and continuations; I have always been too
  +  - pge is implemented in C; l/rx in Perl5. Both are reaching towards
  +    the "bootstrap point", when they'll be implemented in PIR.
  +  - pge generates PIR; l/rx has both PIR and Perl5 backends
  +  - pge uses coroutines and continuations; I have always been too
       wary of their stability, so I use plain subs (with a 'mode'
       parameter to tell it whether to try to match or backtrack)
     - Both allow you to "continue" a match to find all other possible
  @@ -296,18 +298,18 @@
       anyway)
     - l/rx uses match objects (dynclasses/match.pmc) and automatically
       generates a parse tree out of them
  -  - p6ge has built-in "dump out the matching info" routines; I make my
  +  - pge has built-in "dump out the matching info" routines; I make my
       test harnesses generate their own. I'm jealous.
  -  - The feature sets are nearly identical.Makes sense, I suppose --
  +  - The feature sets are nearly identical. Makes sense, I suppose --
       low-hanging fruit and all that.
     - It sounds like the internal design is rather different. I try hard
  -    to compile down to very minimalistic PIR ops. It sounds like p6ge
  +    to compile down to very minimalistic PIR ops. It sounds like pge
       uses lots of higher-level operations, to do things like processing
  -    a whole chunk at a time. (Although on the other hand, p6ge uses
  +    a whole chunk at a time. (Although on the other hand, pge uses
       more native Parrot flow control mechanisms than I do.)
     - Closely related to the above, I have a number of optimizations
  -    already implemented, but I suspect p6ge will end up with a very
  +    already implemented, but I suspect pge will end up with a very
       different set of optimizations.
     - I have on average about 5 hours a week to work on l/rx; Patrick
       has quite a bit more :-) (Which does NOT mean that I work faster;
  -    my engine is at least a year older than p6ge.)
  +    my engine is at least a year older than pge.)
  
  
  
  1.3       +10 -0     parrot/languages/regex/lib/Regex/Driver.pm
  
  Index: Driver.pm
  ===================================================================
  RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Driver.pm,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- Driver.pm 25 Nov 2004 03:00:30 -0000      1.2
  +++ Driver.pm 25 Nov 2004 04:37:57 -0000      1.3
  @@ -109,6 +109,16 @@
   package Regex::Driver::PIR;
   our @ISA = qw(Regex::Driver);
   
  +# sub output_header {
  +#     my ($self, $fh) = @_;
  +#     print $fh <<'END';
  +# .sub __init_regex @LOAD
  +#   loadlib $P0, "match_group"
  +#   .return ()
  +# .end
  +# END
  +# }
  +
   sub output_rule_body {
       my ($self, $fh, $subname, $rule, $ctx, $instructions) = @_;
       print $fh join("\n", @$instructions), "\n";
  
  
  
  1.12      +345 -299  parrot/languages/regex/lib/Regex/Grammar.pm
  
  Index: Grammar.pm
  ===================================================================
  RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Grammar.pm,v
  retrieving revision 1.11
  retrieving revision 1.12
  diff -u -r1.11 -r1.12
  --- Grammar.pm        25 Nov 2004 03:00:30 -0000      1.11
  +++ Grammar.pm        25 Nov 2004 04:37:57 -0000      1.12
  @@ -497,20 +497,46 @@
   
   sub tokenize {
       my $data = shift;
  -    my @tokens = $data =~ /(\\.|.)/g;
  +
  +    my @tokens;
       my @types;
  -    foreach (@tokens) {
  -        if (/^\\(.)/) {
  -            $_ = $1;
  -            push @types, 'CHAR';
  -        } elsif (/\d/) {
  -            push @types, 'NUM';
  -        } elsif (/[\w\s]/) {
  +    my $state = 0;
  +    my $depth = 0;
  +    for my $c (split(//, $data)) {
  +        if ($state == 0) {
  +            if ($c eq '\\') {
  +                $state = 1;
  +            } elsif ($c eq '{') {
  +                push @tokens, '{';
  +                push @types, 'CODE';
  +                $state = 2;
  +                $depth = 1;
  +            } else {
  +                push @tokens, $c;
  +                if ($c =~ /\d/) {
  +                    push @types, 'NUM';
  +                } elsif ($c =~ /[\w\s]/) {
  +                    push @types, 'CHAR';
  +                } else {
  +                    push @types, $c;
  +                }
  +            }
  +        } elsif ($state == 1) {
  +            push @tokens, $c;
               push @types, 'CHAR';
  +            $state = 0;
           } else {
  -            push @types, $_;
  +            if ($c eq '{') {
  +                ++$depth;
  +            } elsif ($c eq '}') {
  +                if (--$depth == 0) {
  +                    $state = 0;
  +                }
  +            }
  +            $tokens[-1] .= $c;
           }
       }
  +
       return [EMAIL PROTECTED], [EMAIL PROTECTED];
   }
   
  @@ -542,17 +568,18 @@
                        'CHAR' => 4,
                        "<" => 3,
                        "&" => 5,
  -                     "^" => 8,
  -                     "(" => 9,
  -                     "[" => 13,
  -                     "." => 14
  +                     'CODE' => 6,
  +                     "^" => 9,
  +                     "(" => 10,
  +                     "[" => 14,
  +                     "." => 15
                },
                GOTOS => {
  -                     'charclass' => 11,
  -                     'expr' => 10,
  -                     'regex' => 12,
  -                     'rule' => 7,
  -                     'regex1' => 6
  +                     'charclass' => 12,
  +                     'expr' => 11,
  +                     'regex' => 13,
  +                     'rule' => 8,
  +                     'regex1' => 7
                }
        },
        {#State 2
  @@ -560,457 +587,470 @@
        },
        {#State 3
                ACTIONS => {
  -                     "?" => 15
  +                     "?" => 16
                },
  -             DEFAULT => -27,
  +             DEFAULT => -28,
                GOTOS => {
  -                     'rulename' => 16
  +                     'rulename' => 17
                }
        },
        {#State 4
                DEFAULT => -11
        },
        {#State 5
  -             DEFAULT => -27,
  +             DEFAULT => -28,
                GOTOS => {
  -                     'rulename' => 17
  +                     'rulename' => 18
                }
        },
        {#State 6
  -             DEFAULT => -6
  +             DEFAULT => -26
        },
        {#State 7
  -             DEFAULT => -1
  +             DEFAULT => -6
        },
        {#State 8
  +             DEFAULT => -1
  +     },
  +     {#State 9
                ACTIONS => {
  -                     "(" => 9,
  +                     "(" => 10,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "[" => 13,
  -                     "." => 14
  +                     "[" => 14,
  +                     'CODE' => 6,
  +                     "." => 15
                },
                GOTOS => {
  -                     'expr' => 10,
  -                     'charclass' => 11,
  -                     'regex1' => 18
  +                     'expr' => 11,
  +                     'charclass' => 12,
  +                     'regex1' => 19
                }
        },
  -     {#State 9
  +     {#State 10
                ACTIONS => {
  -                     "?" => 19
  +                     "?" => 20
                },
                DEFAULT => -21,
                GOTOS => {
  -                     '@1-1' => 20
  +                     '@1-1' => 21
                }
        },
  -     {#State 10
  +     {#State 11
                ACTIONS => {
  -                     "?" => 21,
  +                     "?" => 22,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "+" => 22,
  -                     "{" => 23,
  -                     "(" => 9,
  -                     "|" => 25,
  -                     "*" => 26,
  -                     "\$" => 28,
  -                     "[" => 13,
  -                     "." => 14
  +                     "+" => 23,
  +                     "{" => 24,
  +                     'CODE' => 6,
  +                     "|" => 26,
  +                     "(" => 10,
  +                     "*" => 27,
  +                     "\$" => 29,
  +                     "[" => 14,
  +                     "." => 15
                },
                DEFAULT => -7,
                GOTOS => {
  -                     'expr' => 27,
  -                     'charclass' => 11,
  -                     'range' => 24
  +                     'expr' => 28,
  +                     'charclass' => 12,
  +                     'range' => 25
                }
        },
  -     {#State 11
  +     {#State 12
                DEFAULT => -12
        },
  -     {#State 12
  +     {#State 13
                DEFAULT => -4
        },
  -     {#State 13
  +     {#State 14
                ACTIONS => {
  -                     "-" => 30,
  -                     "^" => 31
  +                     "-" => 31,
  +                     "^" => 32
                },
  -             DEFAULT => -39,
  +             DEFAULT => -40,
                GOTOS => {
  -                     'classpieces' => 29
  +                     'classpieces' => 30
                }
        },
  -     {#State 14
  -             DEFAULT => -37
  -     },
        {#State 15
  -             DEFAULT => -27,
  -             GOTOS => {
  -                     'rulename' => 32
  -             }
  +             DEFAULT => -38
        },
        {#State 16
  -             ACTIONS => {
  -                     'CHAR' => 33,
  -                     ">" => 34
  +             DEFAULT => -28,
  +             GOTOS => {
  +                     'rulename' => 33
                }
        },
        {#State 17
                ACTIONS => {
  -                     'CHAR' => 33,
  -                     "=" => 35
  +                     'CHAR' => 34,
  +                     ">" => 35
                }
        },
        {#State 18
  -             DEFAULT => -5
  +             ACTIONS => {
  +                     'CHAR' => 34,
  +                     "=" => 36
  +             }
        },
        {#State 19
  +             DEFAULT => -5
  +     },
  +     {#State 20
                ACTIONS => {
  -                     ":" => 36
  +                     ":" => 37
                }
        },
  -     {#State 20
  +     {#State 21
                ACTIONS => {
  -                     "(" => 9,
  +                     "(" => 10,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "[" => 13,
  -                     "." => 14
  +                     "[" => 14,
  +                     'CODE' => 6,
  +                     "." => 15
                },
                GOTOS => {
  -                     'expr' => 37,
  -                     'charclass' => 11
  +                     'expr' => 38,
  +                     'charclass' => 12
                }
        },
  -     {#State 21
  +     {#State 22
                ACTIONS => {
                        "?" => undef
                },
                DEFAULT => -17
        },
  -     {#State 22
  +     {#State 23
                ACTIONS => {
  -                     "?" => 39
  +                     "?" => 40
                },
                DEFAULT => -15
        },
  -     {#State 23
  +     {#State 24
                ACTIONS => {
  -                     'NUM' => 42,
  -                     "," => 41
  +                     'NUM' => 43,
  +                     "," => 42
                },
                GOTOS => {
  -                     'number' => 40
  +                     'number' => 41
                }
        },
  -     {#State 24
  +     {#State 25
                ACTIONS => {
  -                     "?" => 43
  +                     "?" => 44
                },
                DEFAULT => -19
        },
  -     {#State 25
  +     {#State 26
                ACTIONS => {
  -                     "(" => 9,
  +                     "(" => 10,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "[" => 13,
  -                     "." => 14
  +                     "[" => 14,
  +                     'CODE' => 6,
  +                     "." => 15
                },
                GOTOS => {
  -                     'expr' => 44,
  -                     'charclass' => 11
  +                     'expr' => 45,
  +                     'charclass' => 12
                }
        },
  -     {#State 26
  +     {#State 27
                ACTIONS => {
  -                     "?" => 45
  +                     "?" => 46
                },
                DEFAULT => -13
        },
  -     {#State 27
  +     {#State 28
                ACTIONS => {
  -                     "?" => 21,
  +                     "?" => 22,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "+" => 22,
  -                     "{" => 23,
  -                     "(" => 9,
  -                     "*" => 26,
  -                     "[" => 13,
  -                     "." => 14
  +                     "+" => 23,
  +                     "{" => 24,
  +                     'CODE' => 6,
  +                     "(" => 10,
  +                     "*" => 27,
  +                     "[" => 14,
  +                     "." => 15
                },
                DEFAULT => -10,
                GOTOS => {
  -                     'expr' => 27,
  -                     'charclass' => 11,
  -                     'range' => 24
  +                     'expr' => 28,
  +                     'charclass' => 12,
  +                     'range' => 25
                }
        },
  -     {#State 28
  +     {#State 29
                DEFAULT => -8
        },
  -     {#State 29
  +     {#State 30
                ACTIONS => {
  -                     'NUM' => 48,
  -                     'CHAR' => 46,
  -                     "]" => 49
  +                     'NUM' => 49,
  +                     'CHAR' => 47,
  +                     "]" => 50
                },
                GOTOS => {
  -                     'classpiece' => 47
  -             }
  -     },
  -     {#State 30
  -             DEFAULT => -39,
  -             GOTOS => {
  -                     'classpieces' => 50
  +                     'classpiece' => 48
                }
        },
        {#State 31
  -             DEFAULT => -39,
  +             DEFAULT => -40,
                GOTOS => {
                        'classpieces' => 51
                }
        },
        {#State 32
  -             ACTIONS => {
  -                     'CHAR' => 33,
  -                     ">" => 52
  +             DEFAULT => -40,
  +             GOTOS => {
  +                     'classpieces' => 52
                }
        },
        {#State 33
  -             DEFAULT => -26
  +             ACTIONS => {
  +                     'CHAR' => 34,
  +                     ">" => 53
  +             }
        },
        {#State 34
  -             DEFAULT => -24
  +             DEFAULT => -27
        },
        {#State 35
  +             DEFAULT => -24
  +     },
  +     {#State 36
                ACTIONS => {
  -                     "^" => 8,
  -                     "(" => 9,
  +                     "^" => 9,
  +                     "(" => 10,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "[" => 13,
  -                     "." => 14
  +                     "[" => 14,
  +                     'CODE' => 6,
  +                     "." => 15
                },
                GOTOS => {
  -                     'expr' => 10,
  -                     'charclass' => 11,
  -                     'regex' => 53,
  -                     'regex1' => 6
  +                     'expr' => 11,
  +                     'charclass' => 12,
  +                     'regex' => 54,
  +                     'regex1' => 7
                }
        },
  -     {#State 36
  +     {#State 37
                ACTIONS => {
  -                     "(" => 9,
  +                     "(" => 10,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "[" => 13,
  -                     "." => 14
  +                     "[" => 14,
  +                     'CODE' => 6,
  +                     "." => 15
                },
                GOTOS => {
  -                     'expr' => 54,
  -                     'charclass' => 11
  +                     'expr' => 55,
  +                     'charclass' => 12
                }
        },
  -     {#State 37
  +     {#State 38
                ACTIONS => {
  -                     "?" => 21,
  +                     "?" => 22,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "+" => 22,
  -                     "{" => 23,
  -                     "(" => 9,
  -                     "|" => 25,
  -                     "*" => 26,
  -                     "[" => 13,
  -                     ")" => 55,
  -                     "." => 14
  -             },
  -             GOTOS => {
  -                     'expr' => 27,
  -                     'charclass' => 11,
  -                     'range' => 24
  +                     "+" => 23,
  +                     "{" => 24,
  +                     'CODE' => 6,
  +                     "(" => 10,
  +                     "|" => 26,
  +                     "*" => 27,
  +                     "[" => 14,
  +                     ")" => 56,
  +                     "." => 15
  +             },
  +             GOTOS => {
  +                     'expr' => 28,
  +                     'charclass' => 12,
  +                     'range' => 25
                }
        },
  -     {#State 38
  +     {#State 39
                DEFAULT => -18
        },
  -     {#State 39
  +     {#State 40
                DEFAULT => -16
        },
  -     {#State 40
  +     {#State 41
                ACTIONS => {
  -                     'NUM' => 58,
  -                     "}" => 56,
  -                     "," => 57
  +                     'NUM' => 59,
  +                     "}" => 57,
  +                     "," => 58
                }
        },
  -     {#State 41
  +     {#State 42
                ACTIONS => {
  -                     'NUM' => 42
  +                     'NUM' => 43
                },
                GOTOS => {
  -                     'number' => 59
  +                     'number' => 60
                }
        },
  -     {#State 42
  -             DEFAULT => -33
  -     },
        {#State 43
  -             DEFAULT => -20
  +             DEFAULT => -34
        },
        {#State 44
  +             DEFAULT => -20
  +     },
  +     {#State 45
                ACTIONS => {
  -                     "?" => 21,
  +                     "?" => 22,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "+" => 22,
  -                     "{" => 23,
  -                     "(" => 9,
  -                     "*" => 26,
  -                     "[" => 13,
  -                     "." => 14
  +                     "+" => 23,
  +                     "{" => 24,
  +                     'CODE' => 6,
  +                     "(" => 10,
  +                     "*" => 27,
  +                     "[" => 14,
  +                     "." => 15
                },
                DEFAULT => -9,
                GOTOS => {
  -                     'expr' => 27,
  -                     'charclass' => 11,
  -                     'range' => 24
  +                     'expr' => 28,
  +                     'charclass' => 12,
  +                     'range' => 25
                }
        },
  -     {#State 45
  -             DEFAULT => -14
  -     },
        {#State 46
  -             ACTIONS => {
  -                     "-" => 60
  -             },
  -             DEFAULT => -41
  +             DEFAULT => -14
        },
        {#State 47
  -             DEFAULT => -38
  -     },
  -     {#State 48
                ACTIONS => {
                        "-" => 61
                },
  -             DEFAULT => -43
  +             DEFAULT => -42
  +     },
  +     {#State 48
  +             DEFAULT => -39
        },
        {#State 49
  -             DEFAULT => -36
  +             ACTIONS => {
  +                     "-" => 62
  +             },
  +             DEFAULT => -44
        },
        {#State 50
  +             DEFAULT => -37
  +     },
  +     {#State 51
                ACTIONS => {
  -                     'NUM' => 48,
  -                     'CHAR' => 46,
  -                     "]" => 62
  +                     'NUM' => 49,
  +                     'CHAR' => 47,
  +                     "]" => 63
                },
                GOTOS => {
  -                     'classpiece' => 47
  +                     'classpiece' => 48
                }
        },
  -     {#State 51
  +     {#State 52
                ACTIONS => {
  -                     'NUM' => 48,
  -                     'CHAR' => 46,
  -                     "]" => 63
  +                     'NUM' => 49,
  +                     'CHAR' => 47,
  +                     "]" => 64
                },
                GOTOS => {
  -                     'classpiece' => 47
  +                     'classpiece' => 48
                }
        },
  -     {#State 52
  +     {#State 53
                DEFAULT => -25
        },
  -     {#State 53
  +     {#State 54
                DEFAULT => -3
        },
  -     {#State 54
  +     {#State 55
                ACTIONS => {
  -                     "?" => 21,
  +                     "?" => 22,
                        'CHAR' => 4,
                        "<" => 3,
  -                     "+" => 22,
  -                     "{" => 23,
  -                     "(" => 9,
  -                     "|" => 25,
  -                     "*" => 26,
  -                     "[" => 13,
  -                     ")" => 64,
  -                     "." => 14
  -             },
  -             GOTOS => {
  -                     'expr' => 27,
  -                     'charclass' => 11,
  -                     'range' => 24
  +                     "+" => 23,
  +                     "{" => 24,
  +                     'CODE' => 6,
  +                     "(" => 10,
  +                     "|" => 26,
  +                     "*" => 27,
  +                     "[" => 14,
  +                     ")" => 65,
  +                     "." => 15
  +             },
  +             GOTOS => {
  +                     'expr' => 28,
  +                     'charclass' => 12,
  +                     'range' => 25
                }
        },
  -     {#State 55
  -             DEFAULT => -22
  -     },
        {#State 56
  -             DEFAULT => -30
  +             DEFAULT => -22
        },
        {#State 57
  +             DEFAULT => -31
  +     },
  +     {#State 58
                ACTIONS => {
  -                     'NUM' => 42,
  -                     "}" => 65
  +                     'NUM' => 43,
  +                     "}" => 66
                },
                GOTOS => {
  -                     'number' => 66
  +                     'number' => 67
                }
        },
  -     {#State 58
  -             DEFAULT => -32
  -     },
        {#State 59
  -             ACTIONS => {
  -                     'NUM' => 58,
  -                     "}" => 67
  -             }
  +             DEFAULT => -33
        },
        {#State 60
                ACTIONS => {
  -                     'CHAR' => 68
  +                     'NUM' => 59,
  +                     "}" => 68
                }
        },
        {#State 61
                ACTIONS => {
  -                     'NUM' => 69
  +                     'CHAR' => 69
                }
        },
        {#State 62
  -             DEFAULT => -35
  +             ACTIONS => {
  +                     'NUM' => 70
  +             }
        },
        {#State 63
  -             DEFAULT => -34
  +             DEFAULT => -36
        },
        {#State 64
  -             DEFAULT => -23
  +             DEFAULT => -35
        },
        {#State 65
  -             DEFAULT => -31
  +             DEFAULT => -23
        },
        {#State 66
  -             ACTIONS => {
  -                     'NUM' => 58,
  -                     "}" => 70
  -             }
  +             DEFAULT => -32
        },
        {#State 67
  -             DEFAULT => -29
  +             ACTIONS => {
  +                     'NUM' => 59,
  +                     "}" => 71
  +             }
        },
        {#State 68
  -             DEFAULT => -40
  +             DEFAULT => -30
        },
        {#State 69
  -             DEFAULT => -42
  +             DEFAULT => -41
        },
        {#State 70
  -             DEFAULT => -28
  +             DEFAULT => -43
  +     },
  +     {#State 71
  +             DEFAULT => -29
        }
   ],
                                     yyrules  =>
  @@ -1021,19 +1061,19 @@
        [#Rule 1
                 'rules', 2,
   sub
  -#line 41 "lib/Regex/Grammar.y"
  +#line 67 "lib/Regex/Grammar.y"
   { return [ @{ $_[1] }, $_[2] ] }
        ],
        [#Rule 2
                 'rules', 0,
   sub
  -#line 43 "lib/Regex/Grammar.y"
  +#line 69 "lib/Regex/Grammar.y"
   { return []; }
        ],
        [#Rule 3
                 'rule', 4,
   sub
  -#line 47 "lib/Regex/Grammar.y"
  +#line 73 "lib/Regex/Grammar.y"
   { my ($name, $tree) = @_[2,4];
        # This is really not the place for this...
        if ($tree->{name} eq 'scan' && $name ne 'default') {
  @@ -1047,7 +1087,7 @@
        [#Rule 4
                 'rule', 1,
   sub
  -#line 57 "lib/Regex/Grammar.y"
  +#line 83 "lib/Regex/Grammar.y"
   {
        my $op = op('rule' => [ 'default', $_[1], 1 + $::paren ]);
        $::paren = 0; # FIXME!
  @@ -1057,235 +1097,241 @@
        [#Rule 5
                 'regex', 2,
   sub
  -#line 65 "lib/Regex/Grammar.y"
  +#line 91 "lib/Regex/Grammar.y"
   { return $_[2]; }
        ],
        [#Rule 6
                 'regex', 1,
   sub
  -#line 67 "lib/Regex/Grammar.y"
  +#line 93 "lib/Regex/Grammar.y"
   { return op('scan' => [ $_[1] ]); }
        ],
        [#Rule 7
                 'regex1', 1,
   sub
  -#line 71 "lib/Regex/Grammar.y"
  +#line 97 "lib/Regex/Grammar.y"
   { return $_[1]; }
        ],
        [#Rule 8
                 'regex1', 2,
   sub
  -#line 73 "lib/Regex/Grammar.y"
  +#line 99 "lib/Regex/Grammar.y"
   { return op('seq' => [ $_[1], op('atend') ]); }
        ],
        [#Rule 9
                 'expr', 3,
   sub
  -#line 77 "lib/Regex/Grammar.y"
  +#line 103 "lib/Regex/Grammar.y"
   { return op('alternate' => [ $_[1], $_[3] ]); }
        ],
        [#Rule 10
                 'expr', 2,
   sub
  -#line 79 "lib/Regex/Grammar.y"
  +#line 105 "lib/Regex/Grammar.y"
   { return op('seq' => [ $_[1], $_[2] ]); }
        ],
        [#Rule 11
                 'expr', 1,
   sub
  -#line 81 "lib/Regex/Grammar.y"
  +#line 107 "lib/Regex/Grammar.y"
   { return op('match' => [ ord($_[1]) ]); }
        ],
        [#Rule 12
                 'expr', 1,
   sub
  -#line 83 "lib/Regex/Grammar.y"
  +#line 109 "lib/Regex/Grammar.y"
   { return op('classpieces' => [ $_[1] ]); }
        ],
        [#Rule 13
                 'expr', 2,
   sub
  -#line 85 "lib/Regex/Grammar.y"
  +#line 111 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ 0, -1, TRUE, $_[1] ]); }
        ],
        [#Rule 14
                 'expr', 3,
   sub
  -#line 87 "lib/Regex/Grammar.y"
  +#line 113 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ 0, -1, FALSE, $_[1] ]); }
        ],
        [#Rule 15
                 'expr', 2,
   sub
  -#line 89 "lib/Regex/Grammar.y"
  +#line 115 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ 1, -1, TRUE, $_[1] ]); }
        ],
        [#Rule 16
                 'expr', 3,
   sub
  -#line 91 "lib/Regex/Grammar.y"
  +#line 117 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ 1, -1, FALSE, $_[1] ]); }
        ],
        [#Rule 17
                 'expr', 2,
   sub
  -#line 93 "lib/Regex/Grammar.y"
  +#line 119 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ 0, 1, TRUE, $_[1] ]); }
        ],
        [#Rule 18
                 'expr', 3,
   sub
  -#line 95 "lib/Regex/Grammar.y"
  +#line 121 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ 0, 1, FALSE, $_[1] ]); }
        ],
        [#Rule 19
                 'expr', 2,
   sub
  -#line 97 "lib/Regex/Grammar.y"
  +#line 123 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ $_[2]->{min}, $_[2]->{max}, TRUE, $_[1] ]); }
        ],
        [#Rule 20
                 'expr', 3,
   sub
  -#line 99 "lib/Regex/Grammar.y"
  +#line 125 "lib/Regex/Grammar.y"
   { return op('multi_match' => [ $_[2]->{min}, $_[2]->{max}, FALSE, $_[1] ]); }
        ],
        [#Rule 21
                 '@1-1', 0,
   sub
  -#line 100 "lib/Regex/Grammar.y"
  +#line 126 "lib/Regex/Grammar.y"
   { ++$::paren }
        ],
        [#Rule 22
                 'expr', 4,
   sub
  -#line 101 "lib/Regex/Grammar.y"
  +#line 127 "lib/Regex/Grammar.y"
   { return op('group' => [ $_[3], $_[2] ]) }
        ],
        [#Rule 23
                 'expr', 5,
   sub
  -#line 103 "lib/Regex/Grammar.y"
  +#line 129 "lib/Regex/Grammar.y"
   { return $_[4]; }
        ],
        [#Rule 24
                 'expr', 3,
   sub
  -#line 105 "lib/Regex/Grammar.y"
  +#line 131 "lib/Regex/Grammar.y"
   { return op('call' => [ $_[2], 0 ]) }
        ],
        [#Rule 25
                 'expr', 4,
   sub
  -#line 107 "lib/Regex/Grammar.y"
  +#line 133 "lib/Regex/Grammar.y"
   { return op('group', [ op('call' => [ $_[3], 1 ]), ++$::paren ]) }
        ],
        [#Rule 26
  +              'expr', 1,
  +sub
  +#line 135 "lib/Regex/Grammar.y"
  +{ return op('code' => [ $_[1] ]) }
  +     ],
  +     [#Rule 27
                 'rulename', 2,
   sub
  -#line 111 "lib/Regex/Grammar.y"
  +#line 139 "lib/Regex/Grammar.y"
   { return $_[1] . $_[2]; }
        ],
  -     [#Rule 27
  +     [#Rule 28
                 'rulename', 0,
   sub
  -#line 113 "lib/Regex/Grammar.y"
  +#line 141 "lib/Regex/Grammar.y"
   { return '' }
        ],
  -     [#Rule 28
  +     [#Rule 29
                 'range', 5,
   sub
  -#line 117 "lib/Regex/Grammar.y"
  +#line 145 "lib/Regex/Grammar.y"
   { return { min => $_[2], max => $_[4] }; }
        ],
  -     [#Rule 29
  +     [#Rule 30
                 'range', 4,
   sub
  -#line 119 "lib/Regex/Grammar.y"
  +#line 147 "lib/Regex/Grammar.y"
   { return { min => 0, max => $_[3] }; }
        ],
  -     [#Rule 30
  +     [#Rule 31
                 'range', 3,
   sub
  -#line 121 "lib/Regex/Grammar.y"
  +#line 149 "lib/Regex/Grammar.y"
   { return { min => $_[2], max => $_[2] }; }
        ],
  -     [#Rule 31
  +     [#Rule 32
                 'range', 4,
   sub
  -#line 123 "lib/Regex/Grammar.y"
  +#line 151 "lib/Regex/Grammar.y"
   { return { min => $_[2], max => -1 }; }
        ],
  -     [#Rule 32
  +     [#Rule 33
                 'number', 2,
   sub
  -#line 126 "lib/Regex/Grammar.y"
  +#line 154 "lib/Regex/Grammar.y"
   { return $_[1] * 10 + $_[2]; }
        ],
  -     [#Rule 33
  +     [#Rule 34
                 'number', 1,
   sub
  -#line 127 "lib/Regex/Grammar.y"
  +#line 155 "lib/Regex/Grammar.y"
   { return $_[1]; }
        ],
  -     [#Rule 34
  +     [#Rule 35
                 'charclass', 4,
   sub
  -#line 130 "lib/Regex/Grammar.y"
  +#line 158 "lib/Regex/Grammar.y"
   { return [ 'neg', $_[3] ]; }
        ],
  -     [#Rule 35
  +     [#Rule 36
                 'charclass', 4,
   sub
  -#line 131 "lib/Regex/Grammar.y"
  +#line 159 "lib/Regex/Grammar.y"
   { push @{$_[3]}, '-'; return $_[3]; }
        ],
  -     [#Rule 36
  +     [#Rule 37
                 'charclass', 3,
   sub
  -#line 132 "lib/Regex/Grammar.y"
  +#line 160 "lib/Regex/Grammar.y"
   { return $_[2]; }
        ],
  -     [#Rule 37
  +     [#Rule 38
                 'charclass', 1,
   sub
  -#line 133 "lib/Regex/Grammar.y"
  +#line 161 "lib/Regex/Grammar.y"
   { return [ 'neg', [] ]; }
        ],
  -     [#Rule 38
  +     [#Rule 39
                 'classpieces', 2,
   sub
  -#line 136 "lib/Regex/Grammar.y"
  +#line 164 "lib/Regex/Grammar.y"
   { push @{$_[1]}, $_[2]; return $_[1]; }
        ],
  -     [#Rule 39
  +     [#Rule 40
                 'classpieces', 0,
   sub
  -#line 137 "lib/Regex/Grammar.y"
  +#line 165 "lib/Regex/Grammar.y"
   { return []; }
        ],
  -     [#Rule 40
  +     [#Rule 41
                 'classpiece', 3,
   sub
  -#line 141 "lib/Regex/Grammar.y"
  +#line 169 "lib/Regex/Grammar.y"
   { return [ $_[1], $_[3] ] }
        ],
  -     [#Rule 41
  +     [#Rule 42
                 'classpiece', 1,
   sub
  -#line 143 "lib/Regex/Grammar.y"
  +#line 171 "lib/Regex/Grammar.y"
   { return $_[1]; }
        ],
  -     [#Rule 42
  +     [#Rule 43
                 'classpiece', 3,
   sub
  -#line 145 "lib/Regex/Grammar.y"
  +#line 173 "lib/Regex/Grammar.y"
   { return [ $_[1], $_[3] ] }
        ],
  -     [#Rule 43
  +     [#Rule 44
                 'classpiece', 1,
   sub
  -#line 147 "lib/Regex/Grammar.y"
  +#line 175 "lib/Regex/Grammar.y"
   { return $_[1]; }
        ]
   ],
  @@ -1293,7 +1339,7 @@
       bless($self,$class);
   }
   
  -#line 150 "lib/Regex/Grammar.y"
  +#line 178 "lib/Regex/Grammar.y"
   
   
   1;
  
  
  
  1.12      +38 -10    parrot/languages/regex/lib/Regex/Grammar.y
  
  Index: Grammar.y
  ===================================================================
  RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Grammar.y,v
  retrieving revision 1.11
  retrieving revision 1.12
  diff -u -r1.11 -r1.12
  --- Grammar.y 25 Nov 2004 03:00:30 -0000      1.11
  +++ Grammar.y 25 Nov 2004 04:37:57 -0000      1.12
  @@ -5,20 +5,46 @@
   
   sub tokenize {
       my $data = shift;
  -    my @tokens = $data =~ /(\\.|.)/g;
  +
  +    my @tokens;
       my @types;
  -    foreach (@tokens) {
  -        if (/^\\(.)/) {
  -            $_ = $1;
  -            push @types, 'CHAR';
  -        } elsif (/\d/) {
  -            push @types, 'NUM';
  -        } elsif (/[\w\s]/) {
  +    my $state = 0;
  +    my $depth = 0;
  +    for my $c (split(//, $data)) {
  +        if ($state == 0) {
  +            if ($c eq '\\') {
  +                $state = 1;
  +            } elsif ($c eq '{') {
  +                push @tokens, '{';
  +                push @types, 'CODE';
  +                $state = 2;
  +                $depth = 1;
  +            } else {
  +                push @tokens, $c;
  +                if ($c =~ /\d/) {
  +                    push @types, 'NUM';
  +                } elsif ($c =~ /[\w\s]/) {
  +                    push @types, 'CHAR';
  +                } else {
  +                    push @types, $c;
  +                }
  +            }
  +        } elsif ($state == 1) {
  +            push @tokens, $c;
               push @types, 'CHAR';
  +            $state = 0;
           } else {
  -            push @types, $_;
  +            if ($c eq '{') {
  +                ++$depth;
  +            } elsif ($c eq '}') {
  +                if (--$depth == 0) {
  +                    $state = 0;
  +                }
  +            }
  +            $tokens[-1] .= $c;
           }
       }
  +
       return [EMAIL PROTECTED], [EMAIL PROTECTED];
   }
   
  @@ -39,7 +65,7 @@
   
   rules : rules rule
     { return [ @{ $_[1] }, $_[2] ] }
  -      | 
  +      |
     { return []; }
   ;
   
  @@ -105,6 +131,8 @@
      { return op('call' => [ $_[2], 0 ]) }
        | '<' '?' rulename '>'
      { return op('group', [ op('call' => [ $_[3], 1 ]), ++$::paren ]) }
  +     | CODE
  +   { return op('code' => [ $_[1] ]) }
   ;
   
   rulename : rulename CHAR
  
  
  
  1.11      +8 -0      parrot/languages/regex/lib/Regex/CodeGen/IMCC.pm
  
  Index: IMCC.pm
  ===================================================================
  RCS file: /cvs/public/parrot/languages/regex/lib/Regex/CodeGen/IMCC.pm,v
  retrieving revision 1.10
  retrieving revision 1.11
  diff -u -r1.10 -r1.11
  --- IMCC.pm   25 Nov 2004 03:51:28 -0000      1.10
  +++ IMCC.pm   25 Nov 2004 04:37:58 -0000      1.11
  @@ -477,4 +477,12 @@
               "unless <rx_tmp>, $fail_label");
   }
   
  +sub output_code {
  +    my ($self, $code) = @_;
  +    # Assume, for now, that the code is PIR code
  +    return ("# START EMBEDDED CODE",
  +            split(/\n/, substr($code, 1, -1)),
  +            "# END EMBEDDED CODE");
  +}
  +
   1;
  
  
  
  1.2       +6 -0      parrot/languages/regex/lib/Regex/CodeGen/Perl5.pm
  
  Index: Perl5.pm
  ===================================================================
  RCS file: /cvs/public/parrot/languages/regex/lib/Regex/CodeGen/Perl5.pm,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- Perl5.pm  18 Nov 2004 07:01:13 -0000      1.1
  +++ Perl5.pm  25 Nov 2004 04:37:58 -0000      1.2
  @@ -460,4 +460,10 @@
       return (@ops, "goto $fail_label if ! \$$uid\->{'!RESULT'};");
   }
   
  +sub output_code {
  +    my ($self, $code) = @_;
  +    # Assume, for now, that the code is Perl5 code
  +    return ("# START EMBEDDED CODE", $code, "# END EMBEDDED CODE");
  +}
  +
   1;
  
  
  
  1.14      +12 -0     parrot/languages/regex/lib/Regex/Ops/Tree.pm
  
  Index: Tree.pm
  ===================================================================
  RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Ops/Tree.pm,v
  retrieving revision 1.13
  retrieving revision 1.14
  diff -u -r1.13 -r1.14
  --- Tree.pm   8 Nov 2004 03:15:48 -0000       1.13
  +++ Tree.pm   25 Nov 2004 04:38:00 -0000      1.14
  @@ -22,6 +22,7 @@
   # scan : Scan through input until R matches
   # atend : At the end of the input?
   # advance : Unconditionally advance 1 char
  +# code : Embedded code, in some language
   
   # Stuff that is used for optimization
   # -----------------------------------
  @@ -48,6 +49,7 @@
   @Regex::Ops::Tree::check::ISA        = qw(Regex::Ops::Tree);
   
   @Regex::Ops::Tree::call::ISA         = qw(Regex::Ops::Tree);
  [EMAIL PROTECTED]::Ops::Tree::code::ISA         = qw(Regex::Ops::Tree);
   
   # Construct a new op
   sub op {
  @@ -342,6 +344,16 @@
   sub hasback { 1 }
   sub startset { undef }
   
  +# Embedded code is truly unpredictable. Although there will probably
  +# be pragmata for allowing code to specify that it won't muck with
  +# things.
  +package Regex::Ops::Tree::code;
  +sub minlen { 0 }
  +sub maxlen { undef }
  +sub dfa_safe { 0 }
  +sub hasback { 0 } # FIXME! code should be allowed to have BACK{} blocks
  +sub startset { undef }
  +
   ########################################################################
   # Rendering - only used for debugging for now
   ########################################################################

cvs commit: parrot/languages/regex/lib/Regex/Ops Tree.pm

Reply via email to