On Wednesday, Jun 4, 2003 "h. w. neff" said:
> hi.
> i'm using P::RD and the fully interpreted version of my
>    resulting tool is incredibly slow.
> the half-step of precompiling my grammar into a .pm helps
>    somewhat in the speed department, albiet with different
>    behaviours.
> i was really hoping to feed my P::RD based beastie to perlcc
>    and get something much faster and self contained.

Well this is *a* solution, but there are others.  With Perl there is always 
More Than One Way To Do It(tm).

I wrote a parser for a hardware description language using a hierarchical 
approach and got it to work.  I observed that it was incredibly slow, like 
you are complaining about.

I turned on $::RD_TRACE and took a good hard look at the result.  I 
discovered that the parser was spending most of its time pursuing false 
paths, so it was investing a lot of effort but not doing a lot of work.

There are a number of performance "tricks" that might help.  These are 
general, your mileage may vary.

1) flatten descriptions.

Hierarchical definition is easy to define and debug, but its slow.  Instead 
of:

DEFINITION:
foo {} |
bar {} |
splat {}

foo:
/regx/ 'constant' variable(s)

etc.

Try this instead:

DEFINITION:
/regx/ 'constant' variable(s) {} |
in-line "bar" definition {} |
in-line "splat" definition

2) use "look-ahead"

Using the lookahead feature can significantly shorten things.  For example:

DEFINITION:
tkid 'foo' foo_type {} |
tkid 'bar' bar_type {} |
tkid 'splat' splat_type {} |
tkid {}

Here a tkid can be followed by a list of keywords, or by nothing.  The parser 
spends an enormous time trying them all.

This can be:

DEFINITION:
tkid ...keywords process_keywords {} |
tkid

keywords:
/(foo|bar|splat)\b/

process_keywords:
'foo' foo_type |
etc.

3) Avoid singleton definitions unless there can be a clear performance 
advantage.  Why add hierarchy if it is not necessary, it just adds additional 
parser calls.

4) throw away things.

Suppose that a tkid can be a /\w+/ string, but cannot be 'foo' or 'bar', 
which are keywords.  You can have two tests, which will rarely trigger or you 
can create a greedy check which then fails.

For example:

tkid:

/\w+/
{
    if (grep(($id eq $_),@all_keywords)) {
        return=undef;
    } else {
        return=$item[1];
    }
}

So most of the time it will succeed, but in the rare case it fails, you are 
safe.  This does not affect the definition of tkid, but affects the parent.  
To handle the keywords case you would have had to specify them first like 
this:

DEFINITION:
'foo' |
'bar' |
tkid

So you make two keyword tests for (typically) no purpose.  But "you have to" 
otherwise it will not parse correctly.

Using the above definition of tkid, you can modify DEFINITION as follows:

DEFINITION:
tkid |
'foo' |
'bar'

This works because if tkid is either foo|bar the test fails and the parser 
proceeds.

5) use regular expressions

By example, instead of this:

DEFINITION:
'(' /\d+/ ')' {}

Do this instead:

DEFINITION:
/\(\s*(\d+)\s*\)/ {my $arg=$1; ...}

You can use these to create really interesting examples.  As an extreme 
example of in-lining and regular expression usage I offer this:

declput:
# this really hairy thing is a performance optimization.
/(dcoutput|hinput|houtput|input|internal|ioput|linput|loutput|modifies|output|
produces|uses)\s+((?:(?:boolean|integer|node|real|string|bus|np|ntri|own|pp|rn
tri|rom|rtri|state|tri|wand|wor)\s+)+)(\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:
_(?!_)))*)\s*(\[\d+(?::\d+)?\])?\s*([EMAIL PROTECTED](?:(?:[a-zA-Z\d\.\$\#]+)|(
?:_(?!_)))*\s*\(\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:_(?!_)))*\s*(=\s*(\d
+|(\"([^\"]|\"\")*\")))?\s*(,\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:_(?!_))
)*\s*(=\s*(\d+|(\"([^\"]|\"\")*\")))?\s*)*\)(\s*,\s*\$?[A-Za-z](?:(?:[a-zA-Z\d
\.\$\#]+)|(?:_(?!_)))*\s*\(\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)|(?:_(?!_)))*
\s*(=\s*(\d+|(\"([^\"]|\"\")*\")))?\s*(,\s*\$?[A-Za-z](?:(?:[a-zA-Z\d\.\$\#]+)
|(?:_(?!_)))*\s*(=\s*(\d+|(\"([^\"]|\"\")*\")))?\s*)*\))*)?\s*/
{my ($p,$tq,$id,$v,$a)=($1,$2,$3,$4,$5);
 ....


Yes, its hairy, but I built it up one piece at a time after I was sure the 
parsing was correct.  At first blush it appears unmaintainable, until you see 
all of the other definitions that went into this using just cut and paste 
(and which are just commented out).  To maintain this, I disable this 
optimization, enable the old definitions, get it right, and then use cut and 
paste again.  It ends up not being as difficult as it appears at first blush.

The bottom line is that you need to turn on tracing and see what the parser 
is ACTUALLY DOING.  Figure out where it is wasting its time and alter the 
grammar to "not do that". This will help tremendously.

Using my tricks improved the parser speed in my case by a factor of at least 
20 and in some cases by as much as 100.

> 
> when i tried that, however, i ran into the "did not compile,
>    which can't happen!" :-/ problem.
> i asked at perlmonks, with no particular useful resolution
>    wrt P::RD and perlcc.
> having been directed here, i see the attached year old query
>    (below) and no follow up at all.
> 
> is this a case of "abandon hope all ye who enter" ?
> hwn
> 
> 
> >      From: Roman Vasicek <[EMAIL PROTECTED]> 
> >      Date: Mon, 22 Apr 2002 12:33:46 +0200 (CEST) 
> > 
> > Hi,
> >  I want to use perlcc to create standalone executable from my program but
> > it allways fails with message like this
> > 
> > /usr/bin/perlcc: test-prd.pl did not compile, which can't happen:
> > Starting compile
> >  Walking tree
> >  Prescan
> >  Saving methods
> >  Use of uninitialized value in length at
> > /usr/lib/perl5/5.6.1/i586-linux/B/C.pm
> > line 544.
> >  Use of uninitialized value in length at
> > /usr/lib/perl5/5.6.1/i586-linux/B/C.pm
> > line 438.
> >  Can't locate object method "save" via package "UWVS" (perhaps you forgot
> > to load "UWVS"?) at /usr/lib/perl5/5.6.1/i586-linux/B/C.pm line 582.
> >  CHECK failed--call queue aborted.
> > 
> > Is anyone here who was successful to compile with perlcc program which
> > use Parse::RecDescent module? Any hints?
> > 
> > The shortest code producing error message above is
> > 
> > #!/usr/bin/perl -w
> > 
> > use strict;
> > use Parse::RecDescent;
> > 
> > exit 0;
> > 
> > 
> > -- 
> >  best regards
> >   Ing. Roman Vasicek
> > 
> >  software developer
> > +----------------------------------------------------------------------------+
> >  PetaMem s.r.o., Drahobejlova 27/1019, 190 00 Praha 9 - Liben, Czech republic
> >  http://www.petamem.com/
> 


--
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ 85226

-- 
 Intel, Corp.
 5000 W. Chandler Blvd.
 Chandler, AZ  85226


Reply via email to