No, this is not about relationship troubles.
I am struggling to work with rejection events. I am trying to deal with
constructs like preprocessing statements or meaningful comments in
programming languages. These (i) can go anywhere in the grammar and (ii)
need to be propagated into the parse tree and (iii) may affect the parse
itself and (iv) cannot be easily parsed with a grammar or an internal lexer.
My idea to parse such constructs was to create lexemes invoked by fake G1
productions which would be tried when the relevant text is encountered and
would create a rejection event. I would then parse the text of these
constructs in an external recognizer upon handling the rejection event and
insert the proper text back into the input string and set the continuation
of the parse to the start of the replacement text. If the replacement text
is legal at the inserted point, parsing should continue just fine, thanks
to the great infrastructure provided by Marpa.
However, things did not go as planned. Please look at the attached example
for detail. In this example, I try to handle preprocessor statements
(#ifdef).
I created a very simple grammar, and added these productions:
fakecpp ::= cpp
cpp ~ '#'
The fakecpp production is actually not reachable. However, when in the
input string, for example:
abc\n#ifdef A\n=\n#else\n+\n#endif\n12
When we hit the "#ifdef", we get a rejection event, and in the handler I
thought I could clean it up:
$pos = $pos + $len - $newlen + 1;
substr($string, $pos, $newlen) = $cpp2;
($string is the original string, $pos is the current position, $len is the
total length of the ifdef, $newlen is the length of the replacement text,
and $cpp2 is the replacement text). I insert the replacement text at the
end of the ifdef and set the position to before the replacement text. Now I
hoped that upon resume the parser would get the replacement text and be
happy.
No such luck. Please note that I got the following to work: Find out what
lexeme was expected and read it with the external parser (lexeme_read), and
proceed with the text after it.
$pos = $pos + $len + 1;
$recce->lexeme_read('OP', $pos, 1, '=');
But this approach only works because this grammar is so simple and I can
easily deal with all cases of possible rejections by looking at the
expected lexemes.
Note that if I put the "=" into the input string and try to continue
parsing from before it, I get another rejection event at this very point.
This is really strange because the grammar expects an OP, I give it an OP,
but it cannot parse it.
Intuitively, there is something I must be doing wrong as it seems there
should be a way of getting this to work.
Any suggestions would be greatly appreciated.
Thanks, Th.
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
use strict; use warnings; use 5.010;
package Parser;
use Data::Dumper;
use Marpa::R2;
my $grammar;
BEGIN {
$grammar = Marpa::R2::Scanless::G->new({
bless_package => 'Ast',
source => \<<'END SOURCE',
:default ::= action => [start,values] bless => ::lhs
inaccessible is ok by default
:start ::= list
:discard ~ ws
list ::= content +
content ::= word operator number
word ::= W
number ::= N
operator ::= OP
W ~ [a-zA-Z_]+
N ~ [0-9]+
OP ~ [=/*+-]
ws ~ [\s]+
fakecpp ::= cpp
cpp ~ '#'
END SOURCE
});
}
my $data = <<EOFDATA;
abc
#ifdef A
=
#else
+
#endif
12
EOFDATA
my $ast = parse(\$data);
print Dumper $ast;
#---------------------------------------------------------------------
sub parse {
my ($ref) = @_;
my $recce = Marpa::R2::Scanless::R->new({
grammar => $grammar,
rejection => 'event',
semantics_package => 'C',
# trace_terminals => 1,
# trace_values => 1,
});
my $val = process($recce, $ref) // die "No parse found";
return $$val;
}
sub process {
my($recce, $ref) = @_;
my $string = $$ref;
my($length) = length $string;
my(@event, $event_name);
my($lexeme, $lexeme_name, $literal);
my($start, $span);
my($value);
for
(
my $pos = $recce -> read(\$string);
$pos < $length;
$pos = $recce -> resume($pos)
)
{
@event = @{$recce -> events};
$event_name = ${$event[0]}[0];
($start, $span) = $recce -> pause_span;
$lexeme_name = $recce -> pause_lexeme;
$lexeme = $recce -> literal($start, $span) if defined $start;
if ($event_name eq q('rejected)) {
if (substr($string,$pos,1) eq '#') {
my $found = substr($string, $pos) =~
/^#ifdef(\s*)(\w+)*(\s*\n)(.*?)\n#else(.*?)\n#endif/so;
die "Incomplete lexeme found at:".substr($string,$pos,10)."\n" unless $found;
my $cpp1 = $2;
my $cpp2 = $4;
my $cpp3 = $5;
my $newlen = length($cpp2);
my $len =
length($1) + length($cpp1) + length($3) +
$newlen + length($cpp3) + 19;
# interesting data:
# @{$recce->terminals_expected}
# $recce->progress
# $recce->current_g1_location
my ($line, $col) = $recce->line_column();
$pos = $pos + $len - $newlen + 1;
substr($string, $pos, $newlen) = $cpp2;
} else {
# We now have another rejection
die "Rejected lexeme at:".substr($string,$pos,10)."\n";
}
} else {
die "Unexpected lexeme '$lexeme_name' with a pause\n";
}
}
return $recce -> value;
}
__END__