Panini parser & SPPF

Piotr Czarnecki Sat, 26 Oct 2019 04:14:18 -0700

Here is one of my recent endeavors in coding that I am particularly 
satisfied with. It is a rewrite of a Shared Packed Parse Forest. I found a 
way to greatly simplify it, resulting in code that is 680 lines long. The 
key insight for Earley parsing is that partial parses can be sorted through 
online sorting. If they enter the parse forest bottom-up, left-to-right*, 
they are ready for traversal. The only tree traversal left to do to discard 
failed partial parses, which won't be a part of the full parse. I'm doing 
this traversal with a mark-and-sweep algorithm similar to the ones used in 
GC, or mark-and-ignore. Online sort is done with a simple binary (max-)heap.

Jeffrey Kegler remarked that he didn't achieve mathematical simplicity in
context-free parsing that he wished for. I did two complete rewrites of my
parse forest. The first code was loosely based on that of Jeffrey Kegler's
with nooks. The second was my own revision of a parse forest. In the
latest, third version, I achieved mathematical simplicity that Jeffrey
Kegler pursued from the beginning.

You may take a look at the code at
https://github.com/pczarn/gearley/tree/master/src/forest. The most
important files are "bocage", "node" and "traverse". The bocage is Jeffrey
Kegler's term for a parse forest. The file contains creation of both the
bocage and its partial parses.

Next is "node", which has a packed representation of nodes in 12 bytes. If
I could afford to lose this packed representation, the code could be even
shorter, but nodes would be 16 bytes in size. Now, I could implement nodes
with variable sizes -- some would be 4 or 8 bytes (1 or 2 integers) in
size, and the largest 12 bytes in size.

Last but not least, traverse is used for evaluating the full parse. As I
already explained, it is a simple loop though all nodes in the order they
entered the forest.

The code has abstraction that allows you to plug in a so-called null
forest, skipping forest use and construction entirely, leaving only the
recognizer at work. Alternatively, you can plug in your own implementation
of a forest, but I don't foresee any use of this except perhaps in a parser
generator for extremely optimized parsers, where each generated parser has
its own optimal forest implementation.

- * I guess it's called rightmost derivation, or x-most derivation, but I'm
not sure.

--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/marpa-parser/d8d19f26-e065-4a9d-908f-c15a7d3ee87c%40googlegroups.com.

Panini parser & SPPF

Reply via email to