PRD parser expontentially slower with larger files

Jan Sundberg Thu, 30 Nov 2006 07:39:04 -0800

Hello.

I have a simple grammar (used in a POE Filter) like:


================
dictionary : '{' key_value(s /\|/) '}'
key_value : key '=' value
key : /[A-Za-z0-9_ ]+/
value : dictionary | string
string : /[^|\}]*/
================

with a valid test string like:

{reply_to={message_type=portfolio_get|private=QQQ}|portfolio_name=P|ordervalidate=OK}

which dumps like:

{
 'ordervalidate' => 'OK',
 'portfolio_name' => 'P',
 'reply_to' => {
   'message_type' => 'portfolio_get',
   'private' => 'QQQ'
 }
}



And a parser:

================
#!/usr/bin/perl

use strict;
use warnings;

use Parse::RecDescent;

$::RD_HINT = 1;
$::RD_ERRORS = 1;
$::RD_WARN = 1;

my $parser = new Parse::RecDescent(q{
        dictionary : '{' key_value(s /\|/) '}'
        { $return = { map { $_->[0] => $_->[1] } @{$item[2]} } }
       key_value : key '=' value
        { [ @item{qw(key value)} ] }
        key : /[A-Za-z0-9_ ]+/
        value : dictionary | string
        string : /[^|\}]*/
   });

while (<>) {
   my $msg = $parser->dictionary($_);
#    print Dumper($msg);
}
==============





However, it gets very slow with larger files. If I generate data files with 
adjustable size with:

==============
#!/usr/bin/perl

use strict;
use warnings;

print 
"{reply_to={message_type=portfolio_get|private=PP}|portfolio_name=PP|ordervalidate=OK|portfolio_positions={";

print join "|", map 
"portfolio_position$_={instrument_id={instrument_tag=$_|market=MMMM|feedcode=FFFF|isincode=IIII|underlying=UUUU|kind=Spot|currency=CUR|multiplier=2|assettype=Equities}|volume=$_|invested=$_|accrued=$_|fee=$_|commission=$_|change_in_volume=$_|change_in_invested=$_|change_in_accrued=$_|change_in_fee=$_|change_in_commission=$_|currency=CUR}",
 1 .. shift;

print "}}|error=0}";
==============



I get execution times like

size    filesiz time
10      3398    .25
20      6788    .42
50      16958   1.17
100     33920   3.29
200     69020   10.9
300     104120  24.1
400     139220  42.0
500     174320  63.8
1000    349832  264
1500    531332  605

and these are kernel user time, not wallclock - it's eating all CPU a for a 
long time!
And the files aren't *that* big.

How can I get it faster ? It seems already simple enough. Please help!


Sincerely.

/ Jan Sundberg

PRD parser expontentially slower with larger files

Reply via email to