cvsuser 04/11/24 20:38:00
Modified: languages/regex README
languages/regex/lib/Regex Driver.pm Grammar.pm Grammar.y
languages/regex/lib/Regex/CodeGen IMCC.pm Perl5.pm
languages/regex/lib/Regex/Ops Tree.pm
Log:
Embedded code! Good toy.
Revision Changes Path
1.15 +31 -29 parrot/languages/regex/README
Index: README
===================================================================
RCS file: /cvs/public/parrot/languages/regex/README,v
retrieving revision 1.14
retrieving revision 1.15
diff -u -r1.14 -r1.15
--- README 25 Nov 2004 03:51:25 -0000 1.14
+++ README 25 Nov 2004 04:37:55 -0000 1.15
@@ -119,6 +119,8 @@
[a-z] - character classes
{n,m} - greedily match n..m times
{n,m}? - nongreedily match n..m times
+ <rule> - rule invocation (Perl6 feature)
+ (?{ }) - embedded code (just {...} in Perl6)
Missing perl5 features:
\w - named character classes
@@ -132,13 +134,11 @@
(?!R) - negative look-ahead assertion
(?<=) - look-behind assertion
(?<!) - negative look-behind assertion
- (?{ }) - embedded code
(??{ }) - match-time evaluated subexpression
(?>R) - independent subexpression
(?(cond)R|S)
- conditional expression
(R?)* - empty match suppression
- <rule> - rule invocation (Perl6 feature)
Missing perl6 features:
:
@@ -252,42 +252,44 @@
================
This rule compiler is not the officially blessed one. That would be
-Patrick Michaud's p6ge, which you can find in compilers/p6ge. This
-engine predates p6ge by a year or two, but never managed to generate
-sufficient interest to get anyone else involved. Patrick must have
-been aware of this engine, since I told him about it both personally
-and in a message to perl6-internals, but he has never acknowledged its
-existence nor explained why he felt the need to start from scratch. I
-have to conclude that he either looked at it and didn't like the
-design or the implementation; or he just wanted to start from scratch
-so that he could fully understand the system he was working on. All of
-which are perfectly good reasons, so I bear no ill will towards the
-official effort.
+Patrick Michaud's PGE (Parrot Grammar Engine) , which you can find in
+compilers/pge. This engine (in languages/regex, referred to in the
+following text as l/rx) predates pge by a year or two, but never
+managed to generate sufficient interest to get anyone else involved.
+Patrick must have been aware of this engine, since I told him about it
+both personally and in a message to perl6-internals, but he has never
+acknowledged its existence nor explained why he felt the need to start
+from scratch. I have to conclude that he either looked at it and
+didn't like the design or the implementation; or he just wanted to
+start from scratch so that he could fully understand the system he was
+working on. All of which are perfectly good reasons, so I bear no ill
+will towards the official effort.
-I am assuming that p6ge is going to get the momentum of the community
+I am assuming that pge is going to get the momentum of the community
behind it, so I would advise anyone interested in working on a rule
engine to look there first. (Look for discussion on the
perl6-internals and perl6-compiler mailing lists.) However, I still
intend to work on this engine for a while longer, and welcome any
interested participants. (Send any requests/comments/suggestions
either to perl6-internals or directly to me at [EMAIL PROTECTED]) So
-far, I have only briefly looked at p6ge, but I think this
+far, I have only briefly looked at pge, but I think this
languages/regex engine has enough of a different approach that it is
still valuable for gathering lessons -- and may still make the most
sense in the long run.
That last statement demands a bit of explanation, so here's an excerpt
-of a mail I sent out after my first look at p6ge (remember, p6ge has
+of a mail I sent out after my first look at pge (remember, pge has
probably advanced past this point by now):
-It sounds like languages/regex handles pretty much exactly the same
-things as p6ge, probably a few more and a few less. I haven't actually
-looked at the code, but from the description I'd guess that the main
-differences are:
+It sounds like l/rx handles pretty much exactly the same things as
+pge, probably a few more and a few less. I haven't actually looked at
+the code, but from the description I'd guess that the main differences
+are:
- - p6ge is implemented in C; l/rx in Perl5
- - p6ge generates PIR; l/rx has both PIR and Perl5 backends
- - p6ge uses coroutines and continuations; I have always been too
+ - pge is implemented in C; l/rx in Perl5. Both are reaching towards
+ the "bootstrap point", when they'll be implemented in PIR.
+ - pge generates PIR; l/rx has both PIR and Perl5 backends
+ - pge uses coroutines and continuations; I have always been too
wary of their stability, so I use plain subs (with a 'mode'
parameter to tell it whether to try to match or backtrack)
- Both allow you to "continue" a match to find all other possible
@@ -296,18 +298,18 @@
anyway)
- l/rx uses match objects (dynclasses/match.pmc) and automatically
generates a parse tree out of them
- - p6ge has built-in "dump out the matching info" routines; I make my
+ - pge has built-in "dump out the matching info" routines; I make my
test harnesses generate their own. I'm jealous.
- - The feature sets are nearly identical.Makes sense, I suppose --
+ - The feature sets are nearly identical. Makes sense, I suppose --
low-hanging fruit and all that.
- It sounds like the internal design is rather different. I try hard
- to compile down to very minimalistic PIR ops. It sounds like p6ge
+ to compile down to very minimalistic PIR ops. It sounds like pge
uses lots of higher-level operations, to do things like processing
- a whole chunk at a time. (Although on the other hand, p6ge uses
+ a whole chunk at a time. (Although on the other hand, pge uses
more native Parrot flow control mechanisms than I do.)
- Closely related to the above, I have a number of optimizations
- already implemented, but I suspect p6ge will end up with a very
+ already implemented, but I suspect pge will end up with a very
different set of optimizations.
- I have on average about 5 hours a week to work on l/rx; Patrick
has quite a bit more :-) (Which does NOT mean that I work faster;
- my engine is at least a year older than p6ge.)
+ my engine is at least a year older than pge.)
1.3 +10 -0 parrot/languages/regex/lib/Regex/Driver.pm
Index: Driver.pm
===================================================================
RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Driver.pm,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- Driver.pm 25 Nov 2004 03:00:30 -0000 1.2
+++ Driver.pm 25 Nov 2004 04:37:57 -0000 1.3
@@ -109,6 +109,16 @@
package Regex::Driver::PIR;
our @ISA = qw(Regex::Driver);
+# sub output_header {
+# my ($self, $fh) = @_;
+# print $fh <<'END';
+# .sub __init_regex @LOAD
+# loadlib $P0, "match_group"
+# .return ()
+# .end
+# END
+# }
+
sub output_rule_body {
my ($self, $fh, $subname, $rule, $ctx, $instructions) = @_;
print $fh join("\n", @$instructions), "\n";
1.12 +345 -299 parrot/languages/regex/lib/Regex/Grammar.pm
Index: Grammar.pm
===================================================================
RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Grammar.pm,v
retrieving revision 1.11
retrieving revision 1.12
diff -u -r1.11 -r1.12
--- Grammar.pm 25 Nov 2004 03:00:30 -0000 1.11
+++ Grammar.pm 25 Nov 2004 04:37:57 -0000 1.12
@@ -497,20 +497,46 @@
sub tokenize {
my $data = shift;
- my @tokens = $data =~ /(\\.|.)/g;
+
+ my @tokens;
my @types;
- foreach (@tokens) {
- if (/^\\(.)/) {
- $_ = $1;
- push @types, 'CHAR';
- } elsif (/\d/) {
- push @types, 'NUM';
- } elsif (/[\w\s]/) {
+ my $state = 0;
+ my $depth = 0;
+ for my $c (split(//, $data)) {
+ if ($state == 0) {
+ if ($c eq '\\') {
+ $state = 1;
+ } elsif ($c eq '{') {
+ push @tokens, '{';
+ push @types, 'CODE';
+ $state = 2;
+ $depth = 1;
+ } else {
+ push @tokens, $c;
+ if ($c =~ /\d/) {
+ push @types, 'NUM';
+ } elsif ($c =~ /[\w\s]/) {
+ push @types, 'CHAR';
+ } else {
+ push @types, $c;
+ }
+ }
+ } elsif ($state == 1) {
+ push @tokens, $c;
push @types, 'CHAR';
+ $state = 0;
} else {
- push @types, $_;
+ if ($c eq '{') {
+ ++$depth;
+ } elsif ($c eq '}') {
+ if (--$depth == 0) {
+ $state = 0;
+ }
+ }
+ $tokens[-1] .= $c;
}
}
+
return [EMAIL PROTECTED], [EMAIL PROTECTED];
}
@@ -542,17 +568,18 @@
'CHAR' => 4,
"<" => 3,
"&" => 5,
- "^" => 8,
- "(" => 9,
- "[" => 13,
- "." => 14
+ 'CODE' => 6,
+ "^" => 9,
+ "(" => 10,
+ "[" => 14,
+ "." => 15
},
GOTOS => {
- 'charclass' => 11,
- 'expr' => 10,
- 'regex' => 12,
- 'rule' => 7,
- 'regex1' => 6
+ 'charclass' => 12,
+ 'expr' => 11,
+ 'regex' => 13,
+ 'rule' => 8,
+ 'regex1' => 7
}
},
{#State 2
@@ -560,457 +587,470 @@
},
{#State 3
ACTIONS => {
- "?" => 15
+ "?" => 16
},
- DEFAULT => -27,
+ DEFAULT => -28,
GOTOS => {
- 'rulename' => 16
+ 'rulename' => 17
}
},
{#State 4
DEFAULT => -11
},
{#State 5
- DEFAULT => -27,
+ DEFAULT => -28,
GOTOS => {
- 'rulename' => 17
+ 'rulename' => 18
}
},
{#State 6
- DEFAULT => -6
+ DEFAULT => -26
},
{#State 7
- DEFAULT => -1
+ DEFAULT => -6
},
{#State 8
+ DEFAULT => -1
+ },
+ {#State 9
ACTIONS => {
- "(" => 9,
+ "(" => 10,
'CHAR' => 4,
"<" => 3,
- "[" => 13,
- "." => 14
+ "[" => 14,
+ 'CODE' => 6,
+ "." => 15
},
GOTOS => {
- 'expr' => 10,
- 'charclass' => 11,
- 'regex1' => 18
+ 'expr' => 11,
+ 'charclass' => 12,
+ 'regex1' => 19
}
},
- {#State 9
+ {#State 10
ACTIONS => {
- "?" => 19
+ "?" => 20
},
DEFAULT => -21,
GOTOS => {
- '@1-1' => 20
+ '@1-1' => 21
}
},
- {#State 10
+ {#State 11
ACTIONS => {
- "?" => 21,
+ "?" => 22,
'CHAR' => 4,
"<" => 3,
- "+" => 22,
- "{" => 23,
- "(" => 9,
- "|" => 25,
- "*" => 26,
- "\$" => 28,
- "[" => 13,
- "." => 14
+ "+" => 23,
+ "{" => 24,
+ 'CODE' => 6,
+ "|" => 26,
+ "(" => 10,
+ "*" => 27,
+ "\$" => 29,
+ "[" => 14,
+ "." => 15
},
DEFAULT => -7,
GOTOS => {
- 'expr' => 27,
- 'charclass' => 11,
- 'range' => 24
+ 'expr' => 28,
+ 'charclass' => 12,
+ 'range' => 25
}
},
- {#State 11
+ {#State 12
DEFAULT => -12
},
- {#State 12
+ {#State 13
DEFAULT => -4
},
- {#State 13
+ {#State 14
ACTIONS => {
- "-" => 30,
- "^" => 31
+ "-" => 31,
+ "^" => 32
},
- DEFAULT => -39,
+ DEFAULT => -40,
GOTOS => {
- 'classpieces' => 29
+ 'classpieces' => 30
}
},
- {#State 14
- DEFAULT => -37
- },
{#State 15
- DEFAULT => -27,
- GOTOS => {
- 'rulename' => 32
- }
+ DEFAULT => -38
},
{#State 16
- ACTIONS => {
- 'CHAR' => 33,
- ">" => 34
+ DEFAULT => -28,
+ GOTOS => {
+ 'rulename' => 33
}
},
{#State 17
ACTIONS => {
- 'CHAR' => 33,
- "=" => 35
+ 'CHAR' => 34,
+ ">" => 35
}
},
{#State 18
- DEFAULT => -5
+ ACTIONS => {
+ 'CHAR' => 34,
+ "=" => 36
+ }
},
{#State 19
+ DEFAULT => -5
+ },
+ {#State 20
ACTIONS => {
- ":" => 36
+ ":" => 37
}
},
- {#State 20
+ {#State 21
ACTIONS => {
- "(" => 9,
+ "(" => 10,
'CHAR' => 4,
"<" => 3,
- "[" => 13,
- "." => 14
+ "[" => 14,
+ 'CODE' => 6,
+ "." => 15
},
GOTOS => {
- 'expr' => 37,
- 'charclass' => 11
+ 'expr' => 38,
+ 'charclass' => 12
}
},
- {#State 21
+ {#State 22
ACTIONS => {
"?" => undef
},
DEFAULT => -17
},
- {#State 22
+ {#State 23
ACTIONS => {
- "?" => 39
+ "?" => 40
},
DEFAULT => -15
},
- {#State 23
+ {#State 24
ACTIONS => {
- 'NUM' => 42,
- "," => 41
+ 'NUM' => 43,
+ "," => 42
},
GOTOS => {
- 'number' => 40
+ 'number' => 41
}
},
- {#State 24
+ {#State 25
ACTIONS => {
- "?" => 43
+ "?" => 44
},
DEFAULT => -19
},
- {#State 25
+ {#State 26
ACTIONS => {
- "(" => 9,
+ "(" => 10,
'CHAR' => 4,
"<" => 3,
- "[" => 13,
- "." => 14
+ "[" => 14,
+ 'CODE' => 6,
+ "." => 15
},
GOTOS => {
- 'expr' => 44,
- 'charclass' => 11
+ 'expr' => 45,
+ 'charclass' => 12
}
},
- {#State 26
+ {#State 27
ACTIONS => {
- "?" => 45
+ "?" => 46
},
DEFAULT => -13
},
- {#State 27
+ {#State 28
ACTIONS => {
- "?" => 21,
+ "?" => 22,
'CHAR' => 4,
"<" => 3,
- "+" => 22,
- "{" => 23,
- "(" => 9,
- "*" => 26,
- "[" => 13,
- "." => 14
+ "+" => 23,
+ "{" => 24,
+ 'CODE' => 6,
+ "(" => 10,
+ "*" => 27,
+ "[" => 14,
+ "." => 15
},
DEFAULT => -10,
GOTOS => {
- 'expr' => 27,
- 'charclass' => 11,
- 'range' => 24
+ 'expr' => 28,
+ 'charclass' => 12,
+ 'range' => 25
}
},
- {#State 28
+ {#State 29
DEFAULT => -8
},
- {#State 29
+ {#State 30
ACTIONS => {
- 'NUM' => 48,
- 'CHAR' => 46,
- "]" => 49
+ 'NUM' => 49,
+ 'CHAR' => 47,
+ "]" => 50
},
GOTOS => {
- 'classpiece' => 47
- }
- },
- {#State 30
- DEFAULT => -39,
- GOTOS => {
- 'classpieces' => 50
+ 'classpiece' => 48
}
},
{#State 31
- DEFAULT => -39,
+ DEFAULT => -40,
GOTOS => {
'classpieces' => 51
}
},
{#State 32
- ACTIONS => {
- 'CHAR' => 33,
- ">" => 52
+ DEFAULT => -40,
+ GOTOS => {
+ 'classpieces' => 52
}
},
{#State 33
- DEFAULT => -26
+ ACTIONS => {
+ 'CHAR' => 34,
+ ">" => 53
+ }
},
{#State 34
- DEFAULT => -24
+ DEFAULT => -27
},
{#State 35
+ DEFAULT => -24
+ },
+ {#State 36
ACTIONS => {
- "^" => 8,
- "(" => 9,
+ "^" => 9,
+ "(" => 10,
'CHAR' => 4,
"<" => 3,
- "[" => 13,
- "." => 14
+ "[" => 14,
+ 'CODE' => 6,
+ "." => 15
},
GOTOS => {
- 'expr' => 10,
- 'charclass' => 11,
- 'regex' => 53,
- 'regex1' => 6
+ 'expr' => 11,
+ 'charclass' => 12,
+ 'regex' => 54,
+ 'regex1' => 7
}
},
- {#State 36
+ {#State 37
ACTIONS => {
- "(" => 9,
+ "(" => 10,
'CHAR' => 4,
"<" => 3,
- "[" => 13,
- "." => 14
+ "[" => 14,
+ 'CODE' => 6,
+ "." => 15
},
GOTOS => {
- 'expr' => 54,
- 'charclass' => 11
+ 'expr' => 55,
+ 'charclass' => 12
}
},
- {#State 37
+ {#State 38
ACTIONS => {
- "?" => 21,
+ "?" => 22,
'CHAR' => 4,
"<" => 3,
- "+" => 22,
- "{" => 23,
- "(" => 9,
- "|" => 25,
- "*" => 26,
- "[" => 13,
- ")" => 55,
- "." => 14
- },
- GOTOS => {
- 'expr' => 27,
- 'charclass' => 11,
- 'range' => 24
+ "+" => 23,
+ "{" => 24,
+ 'CODE' => 6,
+ "(" => 10,
+ "|" => 26,
+ "*" => 27,
+ "[" => 14,
+ ")" => 56,
+ "." => 15
+ },
+ GOTOS => {
+ 'expr' => 28,
+ 'charclass' => 12,
+ 'range' => 25
}
},
- {#State 38
+ {#State 39
DEFAULT => -18
},
- {#State 39
+ {#State 40
DEFAULT => -16
},
- {#State 40
+ {#State 41
ACTIONS => {
- 'NUM' => 58,
- "}" => 56,
- "," => 57
+ 'NUM' => 59,
+ "}" => 57,
+ "," => 58
}
},
- {#State 41
+ {#State 42
ACTIONS => {
- 'NUM' => 42
+ 'NUM' => 43
},
GOTOS => {
- 'number' => 59
+ 'number' => 60
}
},
- {#State 42
- DEFAULT => -33
- },
{#State 43
- DEFAULT => -20
+ DEFAULT => -34
},
{#State 44
+ DEFAULT => -20
+ },
+ {#State 45
ACTIONS => {
- "?" => 21,
+ "?" => 22,
'CHAR' => 4,
"<" => 3,
- "+" => 22,
- "{" => 23,
- "(" => 9,
- "*" => 26,
- "[" => 13,
- "." => 14
+ "+" => 23,
+ "{" => 24,
+ 'CODE' => 6,
+ "(" => 10,
+ "*" => 27,
+ "[" => 14,
+ "." => 15
},
DEFAULT => -9,
GOTOS => {
- 'expr' => 27,
- 'charclass' => 11,
- 'range' => 24
+ 'expr' => 28,
+ 'charclass' => 12,
+ 'range' => 25
}
},
- {#State 45
- DEFAULT => -14
- },
{#State 46
- ACTIONS => {
- "-" => 60
- },
- DEFAULT => -41
+ DEFAULT => -14
},
{#State 47
- DEFAULT => -38
- },
- {#State 48
ACTIONS => {
"-" => 61
},
- DEFAULT => -43
+ DEFAULT => -42
+ },
+ {#State 48
+ DEFAULT => -39
},
{#State 49
- DEFAULT => -36
+ ACTIONS => {
+ "-" => 62
+ },
+ DEFAULT => -44
},
{#State 50
+ DEFAULT => -37
+ },
+ {#State 51
ACTIONS => {
- 'NUM' => 48,
- 'CHAR' => 46,
- "]" => 62
+ 'NUM' => 49,
+ 'CHAR' => 47,
+ "]" => 63
},
GOTOS => {
- 'classpiece' => 47
+ 'classpiece' => 48
}
},
- {#State 51
+ {#State 52
ACTIONS => {
- 'NUM' => 48,
- 'CHAR' => 46,
- "]" => 63
+ 'NUM' => 49,
+ 'CHAR' => 47,
+ "]" => 64
},
GOTOS => {
- 'classpiece' => 47
+ 'classpiece' => 48
}
},
- {#State 52
+ {#State 53
DEFAULT => -25
},
- {#State 53
+ {#State 54
DEFAULT => -3
},
- {#State 54
+ {#State 55
ACTIONS => {
- "?" => 21,
+ "?" => 22,
'CHAR' => 4,
"<" => 3,
- "+" => 22,
- "{" => 23,
- "(" => 9,
- "|" => 25,
- "*" => 26,
- "[" => 13,
- ")" => 64,
- "." => 14
- },
- GOTOS => {
- 'expr' => 27,
- 'charclass' => 11,
- 'range' => 24
+ "+" => 23,
+ "{" => 24,
+ 'CODE' => 6,
+ "(" => 10,
+ "|" => 26,
+ "*" => 27,
+ "[" => 14,
+ ")" => 65,
+ "." => 15
+ },
+ GOTOS => {
+ 'expr' => 28,
+ 'charclass' => 12,
+ 'range' => 25
}
},
- {#State 55
- DEFAULT => -22
- },
{#State 56
- DEFAULT => -30
+ DEFAULT => -22
},
{#State 57
+ DEFAULT => -31
+ },
+ {#State 58
ACTIONS => {
- 'NUM' => 42,
- "}" => 65
+ 'NUM' => 43,
+ "}" => 66
},
GOTOS => {
- 'number' => 66
+ 'number' => 67
}
},
- {#State 58
- DEFAULT => -32
- },
{#State 59
- ACTIONS => {
- 'NUM' => 58,
- "}" => 67
- }
+ DEFAULT => -33
},
{#State 60
ACTIONS => {
- 'CHAR' => 68
+ 'NUM' => 59,
+ "}" => 68
}
},
{#State 61
ACTIONS => {
- 'NUM' => 69
+ 'CHAR' => 69
}
},
{#State 62
- DEFAULT => -35
+ ACTIONS => {
+ 'NUM' => 70
+ }
},
{#State 63
- DEFAULT => -34
+ DEFAULT => -36
},
{#State 64
- DEFAULT => -23
+ DEFAULT => -35
},
{#State 65
- DEFAULT => -31
+ DEFAULT => -23
},
{#State 66
- ACTIONS => {
- 'NUM' => 58,
- "}" => 70
- }
+ DEFAULT => -32
},
{#State 67
- DEFAULT => -29
+ ACTIONS => {
+ 'NUM' => 59,
+ "}" => 71
+ }
},
{#State 68
- DEFAULT => -40
+ DEFAULT => -30
},
{#State 69
- DEFAULT => -42
+ DEFAULT => -41
},
{#State 70
- DEFAULT => -28
+ DEFAULT => -43
+ },
+ {#State 71
+ DEFAULT => -29
}
],
yyrules =>
@@ -1021,19 +1061,19 @@
[#Rule 1
'rules', 2,
sub
-#line 41 "lib/Regex/Grammar.y"
+#line 67 "lib/Regex/Grammar.y"
{ return [ @{ $_[1] }, $_[2] ] }
],
[#Rule 2
'rules', 0,
sub
-#line 43 "lib/Regex/Grammar.y"
+#line 69 "lib/Regex/Grammar.y"
{ return []; }
],
[#Rule 3
'rule', 4,
sub
-#line 47 "lib/Regex/Grammar.y"
+#line 73 "lib/Regex/Grammar.y"
{ my ($name, $tree) = @_[2,4];
# This is really not the place for this...
if ($tree->{name} eq 'scan' && $name ne 'default') {
@@ -1047,7 +1087,7 @@
[#Rule 4
'rule', 1,
sub
-#line 57 "lib/Regex/Grammar.y"
+#line 83 "lib/Regex/Grammar.y"
{
my $op = op('rule' => [ 'default', $_[1], 1 + $::paren ]);
$::paren = 0; # FIXME!
@@ -1057,235 +1097,241 @@
[#Rule 5
'regex', 2,
sub
-#line 65 "lib/Regex/Grammar.y"
+#line 91 "lib/Regex/Grammar.y"
{ return $_[2]; }
],
[#Rule 6
'regex', 1,
sub
-#line 67 "lib/Regex/Grammar.y"
+#line 93 "lib/Regex/Grammar.y"
{ return op('scan' => [ $_[1] ]); }
],
[#Rule 7
'regex1', 1,
sub
-#line 71 "lib/Regex/Grammar.y"
+#line 97 "lib/Regex/Grammar.y"
{ return $_[1]; }
],
[#Rule 8
'regex1', 2,
sub
-#line 73 "lib/Regex/Grammar.y"
+#line 99 "lib/Regex/Grammar.y"
{ return op('seq' => [ $_[1], op('atend') ]); }
],
[#Rule 9
'expr', 3,
sub
-#line 77 "lib/Regex/Grammar.y"
+#line 103 "lib/Regex/Grammar.y"
{ return op('alternate' => [ $_[1], $_[3] ]); }
],
[#Rule 10
'expr', 2,
sub
-#line 79 "lib/Regex/Grammar.y"
+#line 105 "lib/Regex/Grammar.y"
{ return op('seq' => [ $_[1], $_[2] ]); }
],
[#Rule 11
'expr', 1,
sub
-#line 81 "lib/Regex/Grammar.y"
+#line 107 "lib/Regex/Grammar.y"
{ return op('match' => [ ord($_[1]) ]); }
],
[#Rule 12
'expr', 1,
sub
-#line 83 "lib/Regex/Grammar.y"
+#line 109 "lib/Regex/Grammar.y"
{ return op('classpieces' => [ $_[1] ]); }
],
[#Rule 13
'expr', 2,
sub
-#line 85 "lib/Regex/Grammar.y"
+#line 111 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ 0, -1, TRUE, $_[1] ]); }
],
[#Rule 14
'expr', 3,
sub
-#line 87 "lib/Regex/Grammar.y"
+#line 113 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ 0, -1, FALSE, $_[1] ]); }
],
[#Rule 15
'expr', 2,
sub
-#line 89 "lib/Regex/Grammar.y"
+#line 115 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ 1, -1, TRUE, $_[1] ]); }
],
[#Rule 16
'expr', 3,
sub
-#line 91 "lib/Regex/Grammar.y"
+#line 117 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ 1, -1, FALSE, $_[1] ]); }
],
[#Rule 17
'expr', 2,
sub
-#line 93 "lib/Regex/Grammar.y"
+#line 119 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ 0, 1, TRUE, $_[1] ]); }
],
[#Rule 18
'expr', 3,
sub
-#line 95 "lib/Regex/Grammar.y"
+#line 121 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ 0, 1, FALSE, $_[1] ]); }
],
[#Rule 19
'expr', 2,
sub
-#line 97 "lib/Regex/Grammar.y"
+#line 123 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ $_[2]->{min}, $_[2]->{max}, TRUE, $_[1] ]); }
],
[#Rule 20
'expr', 3,
sub
-#line 99 "lib/Regex/Grammar.y"
+#line 125 "lib/Regex/Grammar.y"
{ return op('multi_match' => [ $_[2]->{min}, $_[2]->{max}, FALSE, $_[1] ]); }
],
[#Rule 21
'@1-1', 0,
sub
-#line 100 "lib/Regex/Grammar.y"
+#line 126 "lib/Regex/Grammar.y"
{ ++$::paren }
],
[#Rule 22
'expr', 4,
sub
-#line 101 "lib/Regex/Grammar.y"
+#line 127 "lib/Regex/Grammar.y"
{ return op('group' => [ $_[3], $_[2] ]) }
],
[#Rule 23
'expr', 5,
sub
-#line 103 "lib/Regex/Grammar.y"
+#line 129 "lib/Regex/Grammar.y"
{ return $_[4]; }
],
[#Rule 24
'expr', 3,
sub
-#line 105 "lib/Regex/Grammar.y"
+#line 131 "lib/Regex/Grammar.y"
{ return op('call' => [ $_[2], 0 ]) }
],
[#Rule 25
'expr', 4,
sub
-#line 107 "lib/Regex/Grammar.y"
+#line 133 "lib/Regex/Grammar.y"
{ return op('group', [ op('call' => [ $_[3], 1 ]), ++$::paren ]) }
],
[#Rule 26
+ 'expr', 1,
+sub
+#line 135 "lib/Regex/Grammar.y"
+{ return op('code' => [ $_[1] ]) }
+ ],
+ [#Rule 27
'rulename', 2,
sub
-#line 111 "lib/Regex/Grammar.y"
+#line 139 "lib/Regex/Grammar.y"
{ return $_[1] . $_[2]; }
],
- [#Rule 27
+ [#Rule 28
'rulename', 0,
sub
-#line 113 "lib/Regex/Grammar.y"
+#line 141 "lib/Regex/Grammar.y"
{ return '' }
],
- [#Rule 28
+ [#Rule 29
'range', 5,
sub
-#line 117 "lib/Regex/Grammar.y"
+#line 145 "lib/Regex/Grammar.y"
{ return { min => $_[2], max => $_[4] }; }
],
- [#Rule 29
+ [#Rule 30
'range', 4,
sub
-#line 119 "lib/Regex/Grammar.y"
+#line 147 "lib/Regex/Grammar.y"
{ return { min => 0, max => $_[3] }; }
],
- [#Rule 30
+ [#Rule 31
'range', 3,
sub
-#line 121 "lib/Regex/Grammar.y"
+#line 149 "lib/Regex/Grammar.y"
{ return { min => $_[2], max => $_[2] }; }
],
- [#Rule 31
+ [#Rule 32
'range', 4,
sub
-#line 123 "lib/Regex/Grammar.y"
+#line 151 "lib/Regex/Grammar.y"
{ return { min => $_[2], max => -1 }; }
],
- [#Rule 32
+ [#Rule 33
'number', 2,
sub
-#line 126 "lib/Regex/Grammar.y"
+#line 154 "lib/Regex/Grammar.y"
{ return $_[1] * 10 + $_[2]; }
],
- [#Rule 33
+ [#Rule 34
'number', 1,
sub
-#line 127 "lib/Regex/Grammar.y"
+#line 155 "lib/Regex/Grammar.y"
{ return $_[1]; }
],
- [#Rule 34
+ [#Rule 35
'charclass', 4,
sub
-#line 130 "lib/Regex/Grammar.y"
+#line 158 "lib/Regex/Grammar.y"
{ return [ 'neg', $_[3] ]; }
],
- [#Rule 35
+ [#Rule 36
'charclass', 4,
sub
-#line 131 "lib/Regex/Grammar.y"
+#line 159 "lib/Regex/Grammar.y"
{ push @{$_[3]}, '-'; return $_[3]; }
],
- [#Rule 36
+ [#Rule 37
'charclass', 3,
sub
-#line 132 "lib/Regex/Grammar.y"
+#line 160 "lib/Regex/Grammar.y"
{ return $_[2]; }
],
- [#Rule 37
+ [#Rule 38
'charclass', 1,
sub
-#line 133 "lib/Regex/Grammar.y"
+#line 161 "lib/Regex/Grammar.y"
{ return [ 'neg', [] ]; }
],
- [#Rule 38
+ [#Rule 39
'classpieces', 2,
sub
-#line 136 "lib/Regex/Grammar.y"
+#line 164 "lib/Regex/Grammar.y"
{ push @{$_[1]}, $_[2]; return $_[1]; }
],
- [#Rule 39
+ [#Rule 40
'classpieces', 0,
sub
-#line 137 "lib/Regex/Grammar.y"
+#line 165 "lib/Regex/Grammar.y"
{ return []; }
],
- [#Rule 40
+ [#Rule 41
'classpiece', 3,
sub
-#line 141 "lib/Regex/Grammar.y"
+#line 169 "lib/Regex/Grammar.y"
{ return [ $_[1], $_[3] ] }
],
- [#Rule 41
+ [#Rule 42
'classpiece', 1,
sub
-#line 143 "lib/Regex/Grammar.y"
+#line 171 "lib/Regex/Grammar.y"
{ return $_[1]; }
],
- [#Rule 42
+ [#Rule 43
'classpiece', 3,
sub
-#line 145 "lib/Regex/Grammar.y"
+#line 173 "lib/Regex/Grammar.y"
{ return [ $_[1], $_[3] ] }
],
- [#Rule 43
+ [#Rule 44
'classpiece', 1,
sub
-#line 147 "lib/Regex/Grammar.y"
+#line 175 "lib/Regex/Grammar.y"
{ return $_[1]; }
]
],
@@ -1293,7 +1339,7 @@
bless($self,$class);
}
-#line 150 "lib/Regex/Grammar.y"
+#line 178 "lib/Regex/Grammar.y"
1;
1.12 +38 -10 parrot/languages/regex/lib/Regex/Grammar.y
Index: Grammar.y
===================================================================
RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Grammar.y,v
retrieving revision 1.11
retrieving revision 1.12
diff -u -r1.11 -r1.12
--- Grammar.y 25 Nov 2004 03:00:30 -0000 1.11
+++ Grammar.y 25 Nov 2004 04:37:57 -0000 1.12
@@ -5,20 +5,46 @@
sub tokenize {
my $data = shift;
- my @tokens = $data =~ /(\\.|.)/g;
+
+ my @tokens;
my @types;
- foreach (@tokens) {
- if (/^\\(.)/) {
- $_ = $1;
- push @types, 'CHAR';
- } elsif (/\d/) {
- push @types, 'NUM';
- } elsif (/[\w\s]/) {
+ my $state = 0;
+ my $depth = 0;
+ for my $c (split(//, $data)) {
+ if ($state == 0) {
+ if ($c eq '\\') {
+ $state = 1;
+ } elsif ($c eq '{') {
+ push @tokens, '{';
+ push @types, 'CODE';
+ $state = 2;
+ $depth = 1;
+ } else {
+ push @tokens, $c;
+ if ($c =~ /\d/) {
+ push @types, 'NUM';
+ } elsif ($c =~ /[\w\s]/) {
+ push @types, 'CHAR';
+ } else {
+ push @types, $c;
+ }
+ }
+ } elsif ($state == 1) {
+ push @tokens, $c;
push @types, 'CHAR';
+ $state = 0;
} else {
- push @types, $_;
+ if ($c eq '{') {
+ ++$depth;
+ } elsif ($c eq '}') {
+ if (--$depth == 0) {
+ $state = 0;
+ }
+ }
+ $tokens[-1] .= $c;
}
}
+
return [EMAIL PROTECTED], [EMAIL PROTECTED];
}
@@ -39,7 +65,7 @@
rules : rules rule
{ return [ @{ $_[1] }, $_[2] ] }
- |
+ |
{ return []; }
;
@@ -105,6 +131,8 @@
{ return op('call' => [ $_[2], 0 ]) }
| '<' '?' rulename '>'
{ return op('group', [ op('call' => [ $_[3], 1 ]), ++$::paren ]) }
+ | CODE
+ { return op('code' => [ $_[1] ]) }
;
rulename : rulename CHAR
1.11 +8 -0 parrot/languages/regex/lib/Regex/CodeGen/IMCC.pm
Index: IMCC.pm
===================================================================
RCS file: /cvs/public/parrot/languages/regex/lib/Regex/CodeGen/IMCC.pm,v
retrieving revision 1.10
retrieving revision 1.11
diff -u -r1.10 -r1.11
--- IMCC.pm 25 Nov 2004 03:51:28 -0000 1.10
+++ IMCC.pm 25 Nov 2004 04:37:58 -0000 1.11
@@ -477,4 +477,12 @@
"unless <rx_tmp>, $fail_label");
}
+sub output_code {
+ my ($self, $code) = @_;
+ # Assume, for now, that the code is PIR code
+ return ("# START EMBEDDED CODE",
+ split(/\n/, substr($code, 1, -1)),
+ "# END EMBEDDED CODE");
+}
+
1;
1.2 +6 -0 parrot/languages/regex/lib/Regex/CodeGen/Perl5.pm
Index: Perl5.pm
===================================================================
RCS file: /cvs/public/parrot/languages/regex/lib/Regex/CodeGen/Perl5.pm,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- Perl5.pm 18 Nov 2004 07:01:13 -0000 1.1
+++ Perl5.pm 25 Nov 2004 04:37:58 -0000 1.2
@@ -460,4 +460,10 @@
return (@ops, "goto $fail_label if ! \$$uid\->{'!RESULT'};");
}
+sub output_code {
+ my ($self, $code) = @_;
+ # Assume, for now, that the code is Perl5 code
+ return ("# START EMBEDDED CODE", $code, "# END EMBEDDED CODE");
+}
+
1;
1.14 +12 -0 parrot/languages/regex/lib/Regex/Ops/Tree.pm
Index: Tree.pm
===================================================================
RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Ops/Tree.pm,v
retrieving revision 1.13
retrieving revision 1.14
diff -u -r1.13 -r1.14
--- Tree.pm 8 Nov 2004 03:15:48 -0000 1.13
+++ Tree.pm 25 Nov 2004 04:38:00 -0000 1.14
@@ -22,6 +22,7 @@
# scan : Scan through input until R matches
# atend : At the end of the input?
# advance : Unconditionally advance 1 char
+# code : Embedded code, in some language
# Stuff that is used for optimization
# -----------------------------------
@@ -48,6 +49,7 @@
@Regex::Ops::Tree::check::ISA = qw(Regex::Ops::Tree);
@Regex::Ops::Tree::call::ISA = qw(Regex::Ops::Tree);
[EMAIL PROTECTED]::Ops::Tree::code::ISA = qw(Regex::Ops::Tree);
# Construct a new op
sub op {
@@ -342,6 +344,16 @@
sub hasback { 1 }
sub startset { undef }
+# Embedded code is truly unpredictable. Although there will probably
+# be pragmata for allowing code to specify that it won't muck with
+# things.
+package Regex::Ops::Tree::code;
+sub minlen { 0 }
+sub maxlen { undef }
+sub dfa_safe { 0 }
+sub hasback { 0 } # FIXME! code should be allowed to have BACK{} blocks
+sub startset { undef }
+
########################################################################
# Rendering - only used for debugging for now
########################################################################