I'm interested in PEG parsers, as the implementation for the language Lua is impressively fast. When I saw that factor had a PEG parser library included, I was curious to see how it would compare. As a performance test, I tried to implement a search for the word "Omega" in the text of the King James bible - this is a benchmark that was used for the Lua implementation, and I've used it to test my own code previously.
I wrote the following parser in factor: EBNF: omega omega = 'O' 'm' 'e' 'g' 'a' | . omega ;EBNF There may be alternative approaches (I'd be interested if so) but this is basically a direct translation of the Lua PEG. Against the 4.5 million characters in the text I was using, the Lua code finds the first occurrence (fairly near the end) in 0.14 seconds. Unfortunately, factor failed with a retain stack overflow error - my code was simply "kjv10.txt" ascii file-contents omega I assume from this that the factor implementation is somehow recursive - the Lua implementation uses a small virtual machine coded in C to run a compiled bytecode form of the grammar. A very quickly hacked together implementation of an equivalent interpreter in factor was still running after an hour or so - I'm guessing that either I had an infinite loop in there, or my implementation was severely suboptimal (hardly surprising, as I've been looking at factor for about 2 days total so far!) So, some questions: 1. Is my parser above a sensible way of writing something that skips arbitrary text and finds the first occurrence of a string? If not, what should I be using? 2. Is the retain stack overflow error unavoidable (naively, my code is recursing 4 million or so times) or is this a shallow bug that could be fixed (tail call optimisation, maybe)? 3. Is there a good example anywhere of how to write a high-performance bytecode interpreter in factor, which I could use as a guide to implementing the Lua algorithm in factor (my experience so far says that this isn't something I'll get right if I use the "write correct, high-level code first, then optimise later" approach given my current level of knowledge :-)) As a side note, there doesn't seem to be much documentation of the peg modules (or I'm not finding it somehow). Most of what I've found out, I got from reading some of the articles on Chris Double's website. Is there anything else I can look at on the subject? I tried reading the source, but my head exploded :-) Thanks, Paul ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk