Subtleties of read() vs value() error handling

Christopher Layne Wed, 13 Aug 2014 20:35:23 -0700

Let's talk about this common Marpa idiom I use in my own code:

        my ($len, $value);
        eval { $len = $parser->read(\$input) };
        if ($@) {
                chomp $@;
                return (undef, $@);
        } elsif (!($value = $parser->value())) {
                return (undef, "Parse failed.");
        }


        # empty input streams are $value = \undef
        return ($$value || [], undef);



My question here is why have two different error paths? The only time I ever 
see value() returning an error is when the input is "looking valid" but then 
abruptly hits EOF. In the 2 cases below, where "Parse failed." is returned 
(which means value() returned undef), the input was valid *up to that point*. 
In the cases where read() returns a failure, the input would never have been 
valid, regardless of EOF or not. Now these are obviously intentionally broken 
unit tests, as it's testing the error handling of the grammar, but the part 
I've always wondered about is why some error cases hit read() and some hit 
value().

The docs say this WRT error-handling:

read():
A parse is said to be exhausted if, based on the input read so far, there is no 
way for it to continue successfully. Exhaustion is not a problem if that Marpa 
has read all the way to the end of the input, or if it is pausing for some 
other reason. Otherwise, read() treats an exhausted parse as a failure.

On failure, read() throws an exception. The call is considered successful if it 
ended because a parse was found, or because internal scanning was paused. On 
success, read() returns the location in the input stream at which internal 
scanning ended. This value may be zero.


value():
The value method call evaluates the next parse tree in the parse series, and 
returns a reference to the parse result for that parse tree. If there are no 
more parse trees, the value method returns undef.


Maybe what I'm asking here is why 'partial input' cases are considered valid to 
read() but not valid to value()? If one were to look at the language of read(), 
"there is no way for it to continue successfully", wouldn't hitting the end of 
input be considered "no way for it to continue successfully?" In PRD this is 
usually handled by looking for the token /\Z/ but I'm not sure how to do the 
same thing with Marpa or if I should even be doing that.


I'm not going to post the whole grammar because it's large and basically 
self-evident "item name tag = value" type stuff.

Unit tests that show different behavior (ignore the PRD stuff, it's in there 
for testing the PRD implementation of the same grammar):

# busted case
$in = <<'EOF';
item this-is-not-wrong
        tag huh
EOF
$res_out = {
        'marpa' => [
  undef,
  'Error in SLIF parse: No lexeme found at line 2, column 6
* String before error: item this-is-not-wrong\\n\\ttag\\s
* The error was at line 2, column 6, and at character 0x0068 \'h\', ...
* here: huh\\n
'
],
        'recdescent' => [
  undef,
  'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag huh" instead'
],
};

@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});

# busted case
$in = <<'EOF';
item this-is-not-wrong
        tag =
EOF
$res_out = {
        'marpa' => [
  undef,
  'Parse failed.'
],
        'recdescent' => [
  undef,
  'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag =" instead'
],
};

@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});

# busted case
$in = <<'EOF';
item this-is-not-wrong
        tag = 'busted
EOF
$res_out = {
        'marpa' => [
  undef,
  'Parse failed.'
],
        'recdescent' => [
  undef,
  'ERROR (line 1): Invalid Items: Was expecting EOF but found "tag = \'busted" 
instead'
],
};

@$res = Verdad::parseString($in);
is_deeply_wrap_exc($res, $res_out->{$parser});

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Subtleties of read() vs value() error handling

Reply via email to