Here is one of my recent endeavors in coding that I am particularly 
satisfied with. It is a rewrite of a Shared Packed Parse Forest. I found a 
way to greatly simplify it, resulting in code that is 680 lines long. The 
key insight for Earley parsing is that partial parses can be sorted through 
online sorting. If they enter the parse forest bottom-up, left-to-right*, 
they are ready for traversal. The only tree traversal left to do to discard 
failed partial parses, which won't be a part of the full parse. I'm doing 
this traversal with a mark-and-sweep algorithm similar to the ones used in 
GC, or mark-and-ignore. Online sort is done with a simple binary (max-)heap.

Jeffrey Kegler remarked that he didn't achieve mathematical simplicity in 
context-free parsing that he wished for. I did two complete rewrites of my 
parse forest. The first code was loosely based on that of Jeffrey Kegler's 
with nooks. The second was my own revision of a parse forest. In the 
latest, third version, I achieved mathematical simplicity that Jeffrey 
Kegler pursued from the beginning.

You may take a look at the code at 
https://github.com/pczarn/gearley/tree/master/src/forest. The most 
important files are "bocage", "node" and "traverse". The bocage is Jeffrey 
Kegler's term for a parse forest. The file contains creation of both the 
bocage and its partial parses.

Next is "node", which has a packed representation of nodes in 12 bytes. If 
I could afford to lose this packed representation, the code could be even 
shorter, but nodes would be 16 bytes in size. Now, I could implement nodes 
with variable sizes -- some would be 4 or 8 bytes (1 or 2 integers) in 
size, and the largest 12 bytes in size.

Last but not least, traverse is used for evaluating the full parse. As I 
already explained, it is a simple loop though all nodes in the order they 
entered the forest.

The code has abstraction that allows you to plug in a so-called null 
forest, skipping forest use and construction entirely, leaving only the 
recognizer at work. Alternatively, you can plug in your own implementation 
of a forest, but I don't foresee any use of this except perhaps in a parser 
generator for extremely optimized parsers, where each generated parser has 
its own optimal forest implementation.

- * I guess it's called rightmost derivation, or x-most derivation, but I'm 
not sure.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/marpa-parser/d8d19f26-e065-4a9d-908f-c15a7d3ee87c%40googlegroups.com.

Reply via email to