Here is one of my recent endeavors in coding that I am particularly satisfied with. It is a rewrite of a Shared Packed Parse Forest. I found a way to greatly simplify it, resulting in code that is 680 lines long. The key insight for Earley parsing is that partial parses can be sorted through online sorting. If they enter the parse forest bottom-up, left-to-right*, they are ready for traversal. The only tree traversal left to do to discard failed partial parses, which won't be a part of the full parse. I'm doing this traversal with a mark-and-sweep algorithm similar to the ones used in GC, or mark-and-ignore. Online sort is done with a simple binary (max-)heap.
Jeffrey Kegler remarked that he didn't achieve mathematical simplicity in context-free parsing that he wished for. I did two complete rewrites of my parse forest. The first code was loosely based on that of Jeffrey Kegler's with nooks. The second was my own revision of a parse forest. In the latest, third version, I achieved mathematical simplicity that Jeffrey Kegler pursued from the beginning. You may take a look at the code at https://github.com/pczarn/gearley/tree/master/src/forest. The most important files are "bocage", "node" and "traverse". The bocage is Jeffrey Kegler's term for a parse forest. The file contains creation of both the bocage and its partial parses. Next is "node", which has a packed representation of nodes in 12 bytes. If I could afford to lose this packed representation, the code could be even shorter, but nodes would be 16 bytes in size. Now, I could implement nodes with variable sizes -- some would be 4 or 8 bytes (1 or 2 integers) in size, and the largest 12 bytes in size. Last but not least, traverse is used for evaluating the full parse. As I already explained, it is a simple loop though all nodes in the order they entered the forest. The code has abstraction that allows you to plug in a so-called null forest, skipping forest use and construction entirely, leaving only the recognizer at work. Alternatively, you can plug in your own implementation of a forest, but I don't foresee any use of this except perhaps in a parser generator for extremely optimized parsers, where each generated parser has its own optimal forest implementation. - * I guess it's called rightmost derivation, or x-most derivation, but I'm not sure. -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/marpa-parser/d8d19f26-e065-4a9d-908f-c15a7d3ee87c%40googlegroups.com.
