Ron D. Smith wrote:
On Wednesday, Jun 4, 2003 h. w. neff said:
hi.
i'm using P::RD and the fully interpreted version of my
resulting tool is incredibly slow.
the half-step of precompiling my grammar into a .pm helps
somewhat in the speed department, albiet with different
behaviours.
i was really hoping to feed my P::RD based beastie to perlcc
and get something much faster and self contained.
Well this is *a* solution, but there are others. With Perl there is always
More Than One Way To Do It(tm).
of this i am aware and appreciative.
I wrote a parser for a hardware description language using a hierarchical
approach and got it to work. I observed that it was incredibly slow, like
you are complaining about.
mine's an assembler with c-like syntax for a machine with multiple
execution modules and addressing units.
some of the performance hit doubtless comes from a requirement to
have clauses with multiple phrases with from zero to five terms
in any orderwell, any order was my idea.
(think VLIW (very long instruction word) where control of individual
registers occurs on an instruction by instruction basis.)
I turned on $::RD_TRACE and took a good hard look at the result. I
discovered that the parser was spending most of its time pursuing false
paths, so it was investing a lot of effort but not doing a lot of work.
now that i've got it working, i shall do this sort of examination.
There are a number of performance tricks that might help. These are
general, your mileage may vary.
1) flatten descriptions.
Hierarchical definition is easy to define and debug, but its slow. Instead
of:
DEFINITION:
foo {} |
bar {} |
splat {}
foo:
/regx/ 'constant' variable(s)
etc.
Try this instead:
DEFINITION:
/regx/ 'constant' variable(s) {} |
in-line bar definition {} |
in-line splat definition
i need to do some hierarchical compaction.
for the initial go i've elaborated so that every place a foo can
happen always goes to same one foo production, as you've shown.
2) use look-ahead
Using the lookahead feature can significantly shorten things. For example:
DEFINITION:
tkid 'foo' foo_type {} |
tkid 'bar' bar_type {} |
tkid 'splat' splat_type {} |
tkid {}
Here a tkid can be followed by a list of keywords, or by nothing. The parser
spends an enormous time trying them all.
This can be:
DEFINITION:
tkid ...keywords process_keywords {} |
tkid
keywords:
/(foo|bar|splat)\b/
process_keywords:
'foo' foo_type |
etc.
i have used some lookahead, as well as commit where i can.
i've not yet looked into trying to propogate committedness back
up the hierarchy (usually when i can commit at some level,
whole subtrees at the upper level can/should be trimmed so
they're not tried in case of subsequent failure.
3) Avoid singleton definitions unless there can be a clear performance
advantage. Why add hierarchy if it is not necessary, it just adds additional
parser calls.
(guess i wonder what such a case might be where there is a performance
advantage -- documentation/maintenance advantage, yes, but perf?)
4) throw away things.
Suppose that a tkid can be a /\w+/ string, but cannot be 'foo' or 'bar',
which are keywords. You can have two tests, which will rarely trigger or you
can create a greedy check which then fails.
For example:
tkid:
/\w+/
{
if (grep(($id eq $_),@all_keywords)) {
return=undef;
} else {
return=$item[1];
}
}
So most of the time it will succeed, but in the rare case it fails, you are
safe. This does not affect the definition of tkid, but affects the parent.
To handle the keywords case you would have had to specify them first like
this:
DEFINITION:
'foo' |
'bar' |
tkid
So you make two keyword tests for (typically) no purpose. But you have to
otherwise it will not parse correctly.
Using the above definition of tkid, you can modify DEFINITION as follows:
DEFINITION:
tkid |
'foo' |
'bar'
This works because if tkid is either foo|bar the test fails and the parser
proceeds.
good thought, thanks.
some judicious reordering can probably be done.
5) use regular expressions
By example, instead of this:
DEFINITION:
'(' /\d+/ ')' {}
Do this instead:
DEFINITION:
/\(\s*(\d+)\s*\)/ {my $arg=$1; ...}
ok, i know of a place or two where this applies.
You can use these to create really interesting examples. As an extreme
example of in-lining and regular expression usage I offer this:
declput:
# this really hairy thing is a performance optimization.
/(dcoutput|hinput|houtput|input|internal|ioput|linput|loutput|modifies|output|
produces|uses)\s+((?:(?:boolean|integer|node|real|string|bus|np|ntri|own|pp|rn
tri|rom|rtri|state|tri|wand|wor)\s+)+)(\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:
_(?!_)))*)\s*(\[\d+(?::\d+)?\])?\s*([EMAIL PROTECTED](?:(?:[a-zA-Z\d\.\$\#]+)|(