Let's talk about this common Marpa idiom I use in my own code:
my ($len, $value);
eval { $len = $parser->read(\$input) };
if ($@) {
chomp $@;
return (undef, $@);
} elsif (!($value = $parser->value())) {
return (undef, "Parse failed.");
}
# empty input streams are $value = \undef
return ($$value || [], undef);
My question here is why have two different error paths? The only time I ever
see value() returning an error is when the input is "looking valid" but then
abruptly hits EOF. In the 2 cases below, where "Parse failed." is returned
(which means value() returned undef), the input was valid *up to that point*.
In the cases where read() returns a failure, the input would never have been
valid, regardless of EOF or not. Now these are obviously intentionally broken
unit tests, as it's testing the error handling of the grammar, but the part
I've always wondered about is why some error cases hit read() and some hit
value().
The docs say this WRT error-handling:
read():
A parse is said to be exhausted if, based on the input read so far, there is no
way for it to continue successfully. Exhaustion is not a problem if that Marpa
has read all the way to the end of the input, or if it is pausing for some
other reason. Otherwise, read() treats an exhausted parse as a failure.
On failure, read() throws an exception. The call is considered successful if it
ended because a parse was found, or because internal scanning was paused. On
success, read() returns the location in the input stream at which internal
scanning ended. This value may be zero.
value():
The value method call evaluates the next parse tree in the parse series, and
returns a reference to the parse result for that parse tree. If there are no
more parse trees, the value method returns undef.
Maybe what I'm asking here is why 'partial input' cases are considered valid to
read() but not valid to value()? If one were to look at the language of read(),
"there is no way for it to continue successfully", wouldn't hitting the end of
input be considered "no way for it to continue successfully?" In PRD this is
usually handled by looking for the token /\Z/ but I'm not sure how to do the
same thing with Marpa or if I should even be doing that.
I'm not going to post the whole grammar because it's large and basically
self-evident "item name tag = value" type stuff.
Unit tests that show different behavior (ignore the PRD stuff, it's in there
for testing the PRD implementation of the same grammar):
# busted case
$in = <<'EOF';
item this-is-not-wrong
tag huh
EOF
$res_out = {
'marpa' => [
undef,
'Error in SLIF parse: No lexeme found at line 2, column 6
* String before error: item this-is-not-wrong\\n\\ttag\\s
* The error was at line 2, column 6, and at character 0x0068 \'h\', ...
* here: huh\\n
'
],
'recdescent' => [
undef,
'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag huh" instead'
],
};
@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});
# busted case
$in = <<'EOF';
item this-is-not-wrong
tag =
EOF
$res_out = {
'marpa' => [
undef,
'Parse failed.'
],
'recdescent' => [
undef,
'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag =" instead'
],
};
@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});
# busted case
$in = <<'EOF';
item this-is-not-wrong
tag = 'busted
EOF
$res_out = {
'marpa' => [
undef,
'Parse failed.'
],
'recdescent' => [
undef,
'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag = \'busted"
instead'
],
};
@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.