I'm interested in PEG parsers, as the implementation for the language
Lua is impressively fast. When I saw that factor had a PEG parser
library included, I was curious to see how it would compare. As a
performance test, I tried to implement a search for the word "Omega"
in the text of the King James bible - this is a benchmark that was
used for the Lua implementation, and I've used it to test my own code
previously.

I wrote the following parser in factor:

EBNF: omega
omega = 'O' 'm' 'e' 'g' 'a' | . omega
;EBNF

There may be alternative approaches (I'd be interested if so) but this
is basically a direct translation of the Lua PEG. Against the 4.5
million characters in the text I was using, the Lua code finds the
first occurrence (fairly near the end) in 0.14 seconds.

Unfortunately, factor failed with a retain stack overflow error - my
code was simply "kjv10.txt" ascii file-contents omega

I assume from this that the factor implementation is somehow recursive
- the Lua implementation uses a small virtual machine coded in C to
run a compiled bytecode form of the grammar. A very quickly hacked
together implementation of an equivalent interpreter in factor was
still running after an hour or so - I'm guessing that either I had an
infinite loop in there, or my implementation was severely suboptimal
(hardly surprising, as I've been looking at factor for about 2 days
total so far!)

So, some questions:

1. Is my parser above a sensible way of writing something that skips
arbitrary text and finds the first occurrence of a string? If not,
what should I be using?
2. Is the retain stack overflow error unavoidable (naively, my code is
recursing 4 million or so times) or is this a shallow bug that could
be fixed (tail call optimisation, maybe)?
3. Is there a good example anywhere of how to write a high-performance
bytecode interpreter in factor, which I could use as a guide to
implementing the Lua algorithm in factor (my experience so far says
that this isn't something I'll get right if I use the "write correct,
high-level code first, then optimise later" approach given my current
level of knowledge :-))

As a side note, there doesn't seem to be much documentation of the peg
modules (or I'm not finding it somehow). Most of what I've found out,
I got from reading some of the articles on Chris Double's website. Is
there anything else I can look at on the subject? I tried reading the
source, but my head exploded :-)

Thanks,
Paul

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to