Re: Subtleties of read() vs value() error handling

Jeffrey Kegler Thu, 14 Aug 2014 10:17:39 -0700

In the following I will speak of "valid prefixes" and "sentences in thelanguage". An input string is a sentence if there is a parse for it inthe language (that is, if it is consistent with the grammar.) A prefixis valid if and only if it is the prefix of a sentence.

read() fails whenever the input is not a prefix -- in other words, whenthere is simply no way of continuing the input which will produce avalid parse. (Marpa is unusual in being able to detect this point exactly.)

At any point, the input might be a valid prefix of a sentence, but not asentence. This can happen at the end of input -- if continued, theinput might still produce a valid parse, but at the point where youchose to end things, the input is not a sentence and there is no parse.

Obviously, to write Marpa, I had to work out all these distinctionscarefully. Another question is whether this distinction needs to be soprominent in the interface. In many cases, this distinction isimportant to the user, and the emphasis is needed. When I created theinterface, I did not know whether these distinctions were important inmost cases -- I did not even know which interfaces would be most popularand what would be the focus of the users they attracted.

Perhaps I should add a doit() method which simplifies what seems to haveemerged as the most common use case -- reading a input to the end, withno events, and returning one, and only one parse.


Questions:

1.) What should be its name?  [Probably not doit() ].

2.) I think doit() should throw an exception on error -- that's mostconvenient in simply apps. Does that sound right?

3,) Also, doit() could catch ambiguous parses, and throw an error onthose, with an error message indicating where the ambiguity happened.Currently ambiguity must be explicitly checked for. If you don't, andit's a problem, you (in effect) have a silent error condition. Should Itreat ambiguous parses as errors?


-- jeffrey

On 08/13/2014 08:34 PM, Christopher Layne wrote:

Let's talk about this common Marpa idiom I use in my own code:

         my ($len, $value);
         eval { $len = $parser->read(\$input) };
         if ($@) {
                 chomp $@;
                 return (undef, $@);
         } elsif (!($value = $parser->value())) {
                 return (undef, "Parse failed.");
         }

         # empty input streams are $value = \undef
         return ($$value || [], undef);



My question here is why have two different error paths? The only time I ever see value() returning 
an error is when the input is "looking valid" but then abruptly hits EOF. In the 2 cases 
below, where "Parse failed." is returned (which means value() returned undef), the input 
was valid *up to that point*. In the cases where read() returns a failure, the input would never 
have been valid, regardless of EOF or not. Now these are obviously intentionally broken unit tests, 
as it's testing the error handling of the grammar, but the part I've always wondered about is why 
some error cases hit read() and some hit value().

The docs say this WRT error-handling:

read():
A parse is said to be exhausted if, based on the input read so far, there is no 
way for it to continue successfully. Exhaustion is not a problem if that Marpa 
has read all the way to the end of the input, or if it is pausing for some 
other reason. Otherwise, read() treats an exhausted parse as a failure.

On failure, read() throws an exception. The call is considered successful if it 
ended because a parse was found, or because internal scanning was paused. On 
success, read() returns the location in the input stream at which internal 
scanning ended. This value may be zero.


value():
The value method call evaluates the next parse tree in the parse series, and 
returns a reference to the parse result for that parse tree. If there are no 
more parse trees, the value method returns undef.


Maybe what I'm asking here is why 'partial input' cases are considered valid to read() but not 
valid to value()? If one were to look at the language of read(), "there is no way for it to 
continue successfully", wouldn't hitting the end of input be considered "no way for it to 
continue successfully?" In PRD this is usually handled by looking for the token /\Z/ but I'm 
not sure how to do the same thing with Marpa or if I should even be doing that.


I'm not going to post the whole grammar because it's large and basically self-evident 
"item name tag = value" type stuff.

Unit tests that show different behavior (ignore the PRD stuff, it's in there 
for testing the PRD implementation of the same grammar):

# busted case
$in = <<'EOF';
item this-is-not-wrong
         tag huh
EOF
$res_out = {
         'marpa' => [
   undef,
   'Error in SLIF parse: No lexeme found at line 2, column 6
* String before error: item this-is-not-wrong\\n\\ttag\\s
* The error was at line 2, column 6, and at character 0x0068 \'h\', ...
* here: huh\\n
'
],
         'recdescent' => [
   undef,
   'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag huh" 
instead'
],
};

@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});

# busted case
$in = <<'EOF';
item this-is-not-wrong
         tag =
EOF
$res_out = {
         'marpa' => [
   undef,
   'Parse failed.'
],
         'recdescent' => [
   undef,
   'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag =" instead'
],
};

@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});

# busted case
$in = <<'EOF';
item this-is-not-wrong
         tag = 'busted
EOF
$res_out = {
         'marpa' => [
   undef,
   'Parse failed.'
],
         'recdescent' => [
   undef,
   'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag = \'busted" 
instead'
],
};

@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Subtleties of read() vs value() error handling

Reply via email to