Hi Piotr, This is very cool work! And I am doubly excited it is happening in Rust.
I am a very slow-moving end-user of Shared Packed Parse Forests, as I am gradually (after a multi-year pause) gearing up to parsing all 500 million formulas from arXiv with an incrementally improving ambiguous Marpa grammar. I also happen to be using Rust nowadays and am hoping to offer at least one take on a pragmatic macro language for writing grammars in Rust (currently cleaning up the initial prototype). I failed to get anywhere near to rewriting the SLIF code (https://github.com/jrobsonchase/marpa/issues/2) but maybe a more naive first set of steps will get me closer. While I don't know much on the bocage internals front, just wanted to send an encouraging "this looks cool!" reply - having SPPFs that are extremely efficient while elegantly designed sounds very attractive. I hope the best parts of the work end up landing in the marpa codebase itself, and maybe the Rust talk spikes additional interest. In the latter direction, I recently had a brief exchange with Per from the c2rust team (https://github.com/immunant/c2rust) who seem to be very close to cross-compiling some hallmark C projects (e.g. libxml2) into fully functional Rust. Using c2rust may be an easy "cheat" to transport the thin layer of marpa natively into Rust, potentially unifying a number of efforts. Just listing my dreams out loud for the moment, and expressing some interest here :-) Greetings, Deyan Here is one of my recent endeavors in coding that I am particularly > satisfied with. It is a rewrite of a Shared Packed Parse Forest. I found a > way to greatly simplify it, resulting in code that is 680 lines long. The > key insight for Earley parsing is that partial parses can be sorted through > online sorting. If they enter the parse forest bottom-up, left-to-right*, > they are ready for traversal. The only tree traversal left to do to discard > failed partial parses, which won't be a part of the full parse. I'm doing > this traversal with a mark-and-sweep algorithm similar to the ones used in > GC, or mark-and-ignore. Online sort is done with a simple binary (max-)heap. > > Jeffrey Kegler remarked that he didn't achieve mathematical simplicity in > context-free parsing that he wished for. I did two complete rewrites of my > parse forest. The first code was loosely based on that of Jeffrey Kegler's > with nooks. The second was my own revision of a parse forest. In the > latest, third version, I achieved mathematical simplicity that Jeffrey > Kegler pursued from the beginning. > > You may take a look at the code at > https://github.com/pczarn/gearley/tree/master/src/forest. The most > important files are "bocage", "node" and "traverse". The bocage is Jeffrey > Kegler's term for a parse forest. The file contains creation of both the > bocage and its partial parses. > > Next is "node", which has a packed representation of nodes in 12 bytes. If > I could afford to lose this packed representation, the code could be even > shorter, but nodes would be 16 bytes in size. Now, I could implement nodes > with variable sizes -- some would be 4 or 8 bytes (1 or 2 integers) in > size, and the largest 12 bytes in size. > > Last but not least, traverse is used for evaluating the full parse. As I > already explained, it is a simple loop though all nodes in the order they > entered the forest. > > The code has abstraction that allows you to plug in a so-called null > forest, skipping forest use and construction entirely, leaving only the > recognizer at work. Alternatively, you can plug in your own implementation > of a forest, but I don't foresee any use of this except perhaps in a parser > generator for extremely optimized parsers, where each generated parser has > its own optimal forest implementation. > > - * I guess it's called rightmost derivation, or x-most derivation, but > I'm not sure. > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/marpa-parser/bc79bccc-4cdf-40d3-a83a-fcc959c77471%40googlegroups.com.
