>> I think the next step I would take (if that wasn't sufficient) is to
>> create a test suite that tests each grammar rule independently,
>> successively building up to the complex input that is failing.
>
> I'd been doing this for terminals, but it seems like non-terminals get
> harder to write as unit tests, parser state and all. I suppose integration
> tests could work, but it seems the more complex the grammatical structure,
> the more I experience diminishing returns with tests. I just end up with
> tests that fail mysteriously which is my original problem. Do you have any
> examples about? I wonder if maybe there's something essential I'm failing
> to understand. Maybe looking at your tests might send me along to an a-ha
> moment.

I don't have any tests to show you but I will try to create an example of
what I mean. (I assume that the parser will fail if it cannot parse the
entire input).

Given the following grammar rules:

identifier = letter (letter | digit)*

method-call = identifier (ws+ argument)+

(ws matches white space)

argument = number-literal | identifier

number-literal = digit+


I would write tests something like this:

test(number-literal, "1234")
test(identifier, "foo")
test(argument, "5")
test(argument, "bar")
test(method-call, "method a b c 3")


Obviously this is a made-up example but I hope it shows you what I mean.

John

> Thanks again!
>
>> John
>>
>>
>> On Tue, 2011-12-13 at 23:17 -0800, Casey Ransberger wrote:
>>> I know this has come up before. Hopefully I'm not about to repeat a
>>> lot.
>>>
>>> Debugging this stuff just seems really hard. And significantly harder
>>> than what I've experienced working with e.g. Yacc.
>>>
>>> Hypothesis: Yacc had a lot of time to bake before I ever found it. PEGs
>>> are new, so there's been less overall experience with debugging them.
>>>
>>> I've experimented in what little time I can devote with OMeta,
>>> PetitParser, and Treetop. The debugging experience has been roughly
>>> consistent across all three.
>>>
>>> One particular issue which has bugged me: memoization seems to carry a
>>> lot of instance-state that's really hard to comprehend when the grammar
>>> isn't working as I expect. It's just really hard to use that ocean of
>>> information to figure out what I've done wrong.
>>>
>>> Given that with these new parsing technologies, we're pretty lucky to
>>> see "parse error" as an error message, I can't help but think that it's
>>> worth studying debugging strategies. Heh. :D I'm really not
>>> complaining, I'm just pointing it out.
>>>
>>> Has anyone here found any technique(s) which makes debugging a grammar
>>> written for a PEG/packrat less of a pain in the butt?
>>>
>>> I'd be really interested in hearing about it.
>>>
>>>
>>>
>>> _______________________________________________
>>> fonc mailing list
>>> fonc@vpri.org
>>> http://vpri.org/mailman/listinfo/fonc
>>
>>
>>
>> _______________________________________________
>> fonc mailing list
>> fonc@vpri.org
>> http://vpri.org/mailman/listinfo/fonc
>



_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to