In the following I will speak of "valid prefixes" and "sentences in the
language". An input string is a sentence if there is a parse for it in
the language (that is, if it is consistent with the grammar.) A prefix
is valid if and only if it is the prefix of a sentence.
read() fails whenever the input is not a prefix -- in other words, when
there is simply no way of continuing the input which will produce a
valid parse. (Marpa is unusual in being able to detect this point exactly.)
At any point, the input might be a valid prefix of a sentence, but not a
sentence. This can happen at the end of input -- if continued, the
input might still produce a valid parse, but at the point where you
chose to end things, the input is not a sentence and there is no parse.
Obviously, to write Marpa, I had to work out all these distinctions
carefully. Another question is whether this distinction needs to be so
prominent in the interface. In many cases, this distinction is
important to the user, and the emphasis is needed. When I created the
interface, I did not know whether these distinctions were important in
most cases -- I did not even know which interfaces would be most popular
and what would be the focus of the users they attracted.
Perhaps I should add a doit() method which simplifies what seems to have
emerged as the most common use case -- reading a input to the end, with
no events, and returning one, and only one parse.
Questions:
1.) What should be its name? [Probably not doit() ].
2.) I think doit() should throw an exception on error -- that's most
convenient in simply apps. Does that sound right?
3,) Also, doit() could catch ambiguous parses, and throw an error on
those, with an error message indicating where the ambiguity happened.
Currently ambiguity must be explicitly checked for. If you don't, and
it's a problem, you (in effect) have a silent error condition. Should I
treat ambiguous parses as errors?
-- jeffrey
On 08/13/2014 08:34 PM, Christopher Layne wrote:
Let's talk about this common Marpa idiom I use in my own code:
my ($len, $value);
eval { $len = $parser->read(\$input) };
if ($@) {
chomp $@;
return (undef, $@);
} elsif (!($value = $parser->value())) {
return (undef, "Parse failed.");
}
# empty input streams are $value = \undef
return ($$value || [], undef);
My question here is why have two different error paths? The only time I ever see value() returning
an error is when the input is "looking valid" but then abruptly hits EOF. In the 2 cases
below, where "Parse failed." is returned (which means value() returned undef), the input
was valid *up to that point*. In the cases where read() returns a failure, the input would never
have been valid, regardless of EOF or not. Now these are obviously intentionally broken unit tests,
as it's testing the error handling of the grammar, but the part I've always wondered about is why
some error cases hit read() and some hit value().
The docs say this WRT error-handling:
read():
A parse is said to be exhausted if, based on the input read so far, there is no
way for it to continue successfully. Exhaustion is not a problem if that Marpa
has read all the way to the end of the input, or if it is pausing for some
other reason. Otherwise, read() treats an exhausted parse as a failure.
On failure, read() throws an exception. The call is considered successful if it
ended because a parse was found, or because internal scanning was paused. On
success, read() returns the location in the input stream at which internal
scanning ended. This value may be zero.
value():
The value method call evaluates the next parse tree in the parse series, and
returns a reference to the parse result for that parse tree. If there are no
more parse trees, the value method returns undef.
Maybe what I'm asking here is why 'partial input' cases are considered valid to read() but not
valid to value()? If one were to look at the language of read(), "there is no way for it to
continue successfully", wouldn't hitting the end of input be considered "no way for it to
continue successfully?" In PRD this is usually handled by looking for the token /\Z/ but I'm
not sure how to do the same thing with Marpa or if I should even be doing that.
I'm not going to post the whole grammar because it's large and basically self-evident
"item name tag = value" type stuff.
Unit tests that show different behavior (ignore the PRD stuff, it's in there
for testing the PRD implementation of the same grammar):
# busted case
$in = <<'EOF';
item this-is-not-wrong
tag huh
EOF
$res_out = {
'marpa' => [
undef,
'Error in SLIF parse: No lexeme found at line 2, column 6
* String before error: item this-is-not-wrong\\n\\ttag\\s
* The error was at line 2, column 6, and at character 0x0068 \'h\', ...
* here: huh\\n
'
],
'recdescent' => [
undef,
'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag huh"
instead'
],
};
@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});
# busted case
$in = <<'EOF';
item this-is-not-wrong
tag =
EOF
$res_out = {
'marpa' => [
undef,
'Parse failed.'
],
'recdescent' => [
undef,
'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag =" instead'
],
};
@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});
# busted case
$in = <<'EOF';
item this-is-not-wrong
tag = 'busted
EOF
$res_out = {
'marpa' => [
undef,
'Parse failed.'
],
'recdescent' => [
undef,
'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag = \'busted"
instead'
],
};
@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.