I'm currently immersed in a rewrite of the Marpa theory paper, and until I'm done with that won't have the cycles to give this work the sort of serious attention this deserves. Some random notes:
"Bocage" is my term, and I developed them independently, but Elizabeth Scott preceded me, and did a very nice write-up. So close are the two that her paper is the write-up I might have needed to do, except that hers is probably better than what I would have turned out. In Marpa::R3, the bocage is being eliminated, in favor of evaluation based on the ASF's of Marpa::R2. I look forward to a closer look, but alas it cannot happen immediately. On Sat, Oct 26, 2019 at 7:13 AM Piotr Czarnecki <[email protected]> wrote: > Here is one of my recent endeavors in coding that I am particularly > satisfied with. It is a rewrite of a Shared Packed Parse Forest. I found a > way to greatly simplify it, resulting in code that is 680 lines long. The > key insight for Earley parsing is that partial parses can be sorted through > online sorting. If they enter the parse forest bottom-up, left-to-right*, > they are ready for traversal. The only tree traversal left to do to discard > failed partial parses, which won't be a part of the full parse. I'm doing > this traversal with a mark-and-sweep algorithm similar to the ones used in > GC, or mark-and-ignore. Online sort is done with a simple binary (max-)heap. > > Jeffrey Kegler remarked that he didn't achieve mathematical simplicity in > context-free parsing that he wished for. I did two complete rewrites of my > parse forest. The first code was loosely based on that of Jeffrey Kegler's > with nooks. The second was my own revision of a parse forest. In the > latest, third version, I achieved mathematical simplicity that Jeffrey > Kegler pursued from the beginning. > > You may take a look at the code at > https://github.com/pczarn/gearley/tree/master/src/forest. The most > important files are "bocage", "node" and "traverse". The bocage is Jeffrey > Kegler's term for a parse forest. The file contains creation of both the > bocage and its partial parses. > > Next is "node", which has a packed representation of nodes in 12 bytes. If > I could afford to lose this packed representation, the code could be even > shorter, but nodes would be 16 bytes in size. Now, I could implement nodes > with variable sizes -- some would be 4 or 8 bytes (1 or 2 integers) in > size, and the largest 12 bytes in size. > > Last but not least, traverse is used for evaluating the full parse. As I > already explained, it is a simple loop though all nodes in the order they > entered the forest. > > The code has abstraction that allows you to plug in a so-called null > forest, skipping forest use and construction entirely, leaving only the > recognizer at work. Alternatively, you can plug in your own implementation > of a forest, but I don't foresee any use of this except perhaps in a parser > generator for extremely optimized parsers, where each generated parser has > its own optimal forest implementation. > > - * I guess it's called rightmost derivation, or x-most derivation, but > I'm not sure. > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/marpa-parser/d8d19f26-e065-4a9d-908f-c15a7d3ee87c%40googlegroups.com > <https://groups.google.com/d/msgid/marpa-parser/d8d19f26-e065-4a9d-908f-c15a7d3ee87c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/marpa-parser/CA%2B2Wrv-XHQ9PcJHF-DVRTGJ9D2pRv-Rs04fiOzx86a4Mp9AQrg%40mail.gmail.com.
