FYI, here's the perl file:

use strict;
use Parse::RecDescent;

$::RD_ERRORS     = 1; # unless undefined, report fatal errors
$::RD_WARN       = 1; # unless undefined, also report non-fatal problems
$::RD_HINT       = 1; # if defined, also suggestion remedies
#$::RD_TRACE      = 1; # if defined, also trace parsers' behaviour
#$::RD_AUTOSTUB   = 1; # if defined, generates "stubs" for undefined
rules
$::RD_AUTOACTION = q{print "."}; # if defined, appends specified action
to productions

# Load up the grammar from the file
open( grammarFile, "QuickGrammar.txt" ) or die "Could not open grammar
file\n";
my @grammar = <grammarFile>;
close(grammarFile);

# Check the grammar
my $parser = Parse::RecDescent->new(join '', @grammar) or die "Bad
Grammar";

# Open and save the file contents
open( parsedFile, "bigfile.txt" ) or die "Could not open input file\n";
my @data = <parsedFile>;
close(parsedFile);

# Parse the file contents, joining all of the lines into a single one
my $retValue = $parser->OMDFile(join '', @data); 

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Monday, July 17, 2006 2:47 PM
> To: recdescent@perl.org
> Subject: Speed issue w/ LARGE parsed file
> 
> Hey all,
>       I'm a recdescent newbie, so please cut me some slack ;)
> 
> I've got a ~1.5Mb file that I'm parsing.  The grammar is 
> pretty well established, in such that it's from a formal 
> paper, and has EBNF notation written about it.  I've looked 
> at the EBNF notation, and done my best to simplify it.  In 
> other words, EBNF says some number should be from 0-65535, so 
> I just specify /\d{1,5}/ to simplify & speed up the processing.
> 
> W/ the first set of working grammar (tested using a subset of 
> the file), and it has about 85 separate rules.
> I tried running it on the "full" file, but it just took too damn long.
> 
> So, I went about creating a much simpler parser (even 
> dumber), so I could do some pre-parsing, to speed things up.
> 
> The file looks like:
> 
> (foo bar)
> (foo (bar baz))
> (foo "bar")
> (foo (bar "baz")
> 
> And these levels of data could be several levels deep w/ data.  E.g.:
> 
> (foo (bar baz)
> (baz baz)
> (baz (baz (baz(baz "bar")))))
> 
> So, I dumbed down my grammar (as can be seen below) but it 
> still takes longer than I have patience for ( > 10 minutes) to parse.
> 
> Am I SOL with parsing this file use RecDescent or is 
> something glaringly bad w/ the below syntax?
> 
> TIA
> 
> --dw
> 
> ############################################################
> # The main file has a header, and one or more object models 
> File : Header Model(s)
> 
> # Define what the header is
> Header: 
>     "(" /Header[\s]v[\d]+\.[\d]+\.[\d]+\.[\d]+/ ")" 
>     | <error: Invalid Header>
> 
> # Define what the object model is
> Model: 
>     "(Model"
>         Item(s)        
>     ")"
>     | <error: Invalid parse of the ObjectModel>
> 
> Item:
>     "(" /\b[^\s]+\b/ /[^\(\)]*/ Item(s?) ")"  # Simply two tokens
>     | "(" /\b[^\s]+\b/ "\"" /[^\"]*/ "\"" Item(s?) ")"
>     | <error>
> 
> 
> # These items left in for clarity's sake. Functionally 
> equivalent # to Item above, but hopefully faster
> OldItem:
>     "(" Label Data Item(s?) ")"  # Simply two tokens
>     | "(" Label QuotedData Item(s?) ")"
>     | <error>
> 
> Label:
>       /\b[^\s]+\b/
> 
> Data:
>       /[^\(\)]*/
> 
> QuotedData:
>       "\"" /[^\"]*/ "\""
> ############################################################
> 

Reply via email to