Handling rejection

Thomas Weigert Mon, 09 Feb 2015 16:22:06 -0800

No, this is not about relationship troubles.

I am struggling to work with rejection events. I am trying to deal with 
constructs like preprocessing statements or meaningful comments in 
programming languages. These (i) can go anywhere in the grammar and (ii) 
need to be propagated into the parse tree and (iii) may affect the parse 
itself and (iv) cannot be easily parsed with a grammar or an internal lexer.


My idea to parse such constructs was to create lexemes invoked by fake G1 
productions which would be tried when the relevant text is encountered and 
would create a rejection event. I would then parse the text of these 
constructs in an external recognizer upon handling the rejection event and 
insert the proper text back into the input string and set the continuation 
of the parse to the start of the replacement text. If the replacement text 
is legal at the inserted point, parsing should continue just fine, thanks 
to the great infrastructure provided by Marpa.

However, things did not go as planned. Please look at the attached example 
for detail. In this example, I try to handle preprocessor statements 
(#ifdef).

I created a very simple grammar, and added these productions:

fakecpp ::= cpp
cpp ~ '#'

The fakecpp production is actually not reachable. However, when in the 
input string, for example:
       abc\n#ifdef A\n=\n#else\n+\n#endif\n12
When we hit the "#ifdef", we get a rejection event, and in the handler I 
thought I could clean it up:
            $pos = $pos + $len - $newlen + 1;
            substr($string, $pos, $newlen) = $cpp2;
($string is the original string, $pos is the current position, $len is the 
total length of the ifdef, $newlen is the length of the replacement text, 
and $cpp2 is the replacement text). I insert the replacement text at the 
end of the ifdef and set the position to before the replacement text. Now I 
hoped that upon resume the parser would get the replacement text and be 
happy. 

No such luck. Please note that I got the following to work: Find out what 
lexeme was expected and read it with the external parser (lexeme_read), and 
proceed with the text after it.
                $pos = $pos + $len + 1;
                $recce->lexeme_read('OP', $pos, 1, '=');
But this approach only works because this grammar is so simple and I can 
easily deal with all cases of possible rejections by looking at the 
expected lexemes.

Note that if I put the "=" into the input string and try to continue 
parsing from before it, I get another rejection event at this very point. 
This is really strange because the grammar expects an OP, I give it an OP, 
but it cannot parse it.

Intuitively, there is something I must be doing wrong as it seems there 
should be a way of getting this to work.

Any suggestions would be greatly appreciated.

Thanks, Th.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

use strict; use warnings; use 5.010;

package Parser;

use Data::Dumper;
use Marpa::R2;

my $grammar;

BEGIN {
    $grammar = Marpa::R2::Scanless::G->new({
        bless_package => 'Ast',
        source => \<<'END SOURCE',
            :default    ::= action => [start,values] bless => ::lhs
            inaccessible is ok by default
            :start      ::= list
            :discard    ~ ws

list ::= content +
content ::= word operator number
word ::= W
number ::= N
operator ::= OP

W ~ [a-zA-Z_]+
N ~ [0-9]+
OP ~ [=/*+-]
ws      ~ [\s]+

fakecpp ::= cpp
cpp ~ '#'

END SOURCE
    });
}

my $data = <<EOFDATA;
abc 
#ifdef A 
= 
#else
+
#endif
12
EOFDATA

my $ast = parse(\$data);
print Dumper $ast;


#---------------------------------------------------------------------

sub parse {
    my ($ref) = @_;
    my $recce = Marpa::R2::Scanless::R->new({ 
	grammar => $grammar,
	rejection => 'event',
	semantics_package => 'C',
#        trace_terminals => 1,
#        trace_values => 1,
					    });
    my $val = process($recce, $ref) // die "No parse found";
    return $$val;
}

sub process {
	my($recce, $ref) = @_;
	my $string = $$ref;
	my($length) = length $string;

	my(@event, $event_name);
	my($lexeme, $lexeme_name, $literal);
	my($start, $span);
	my($value);

	for
	(
		my $pos = $recce -> read(\$string);
		$pos < $length;
		$pos = $recce -> resume($pos)
	)
	{
		@event          = @{$recce -> events};
		$event_name     = ${$event[0]}[0];
		($start, $span) = $recce -> pause_span;
		$lexeme_name    = $recce -> pause_lexeme;
		$lexeme         = $recce -> literal($start, $span) if defined $start;

		if ($event_name eq q('rejected)) {
		    if (substr($string,$pos,1) eq '#') {
			my $found = substr($string, $pos) =~ 
			    /^#ifdef(\s*)(\w+)*(\s*\n)(.*?)\n#else(.*?)\n#endif/so;
			die "Incomplete lexeme found at:".substr($string,$pos,10)."\n" unless $found;
			my $cpp1 = $2;
			my $cpp2 = $4;
			my $cpp3 = $5;
			my $newlen = length($cpp2);
			my $len = 
			    length($1) + length($cpp1) + length($3) +
			    $newlen + length($cpp3) + 19;
			# interesting data:
			# @{$recce->terminals_expected}
			# $recce->progress
			# $recce->current_g1_location
			my ($line, $col) = $recce->line_column();

			$pos = $pos + $len - $newlen + 1;
			substr($string, $pos, $newlen) = $cpp2;
		    } else {
			# We now have another rejection
			die "Rejected lexeme at:".substr($string,$pos,10)."\n";
		    }
		} else {
			die "Unexpected lexeme '$lexeme_name' with a pause\n";
		}
	}

	return $recce -> value;

}


__END__

Handling rejection

Reply via email to