FYI, here's the perl file: use strict; use Parse::RecDescent;
$::RD_ERRORS = 1; # unless undefined, report fatal errors $::RD_WARN = 1; # unless undefined, also report non-fatal problems $::RD_HINT = 1; # if defined, also suggestion remedies #$::RD_TRACE = 1; # if defined, also trace parsers' behaviour #$::RD_AUTOSTUB = 1; # if defined, generates "stubs" for undefined rules $::RD_AUTOACTION = q{print "."}; # if defined, appends specified action to productions # Load up the grammar from the file open( grammarFile, "QuickGrammar.txt" ) or die "Could not open grammar file\n"; my @grammar = <grammarFile>; close(grammarFile); # Check the grammar my $parser = Parse::RecDescent->new(join '', @grammar) or die "Bad Grammar"; # Open and save the file contents open( parsedFile, "bigfile.txt" ) or die "Could not open input file\n"; my @data = <parsedFile>; close(parsedFile); # Parse the file contents, joining all of the lines into a single one my $retValue = $parser->OMDFile(join '', @data); > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Monday, July 17, 2006 2:47 PM > To: recdescent@perl.org > Subject: Speed issue w/ LARGE parsed file > > Hey all, > I'm a recdescent newbie, so please cut me some slack ;) > > I've got a ~1.5Mb file that I'm parsing. The grammar is > pretty well established, in such that it's from a formal > paper, and has EBNF notation written about it. I've looked > at the EBNF notation, and done my best to simplify it. In > other words, EBNF says some number should be from 0-65535, so > I just specify /\d{1,5}/ to simplify & speed up the processing. > > W/ the first set of working grammar (tested using a subset of > the file), and it has about 85 separate rules. > I tried running it on the "full" file, but it just took too damn long. > > So, I went about creating a much simpler parser (even > dumber), so I could do some pre-parsing, to speed things up. > > The file looks like: > > (foo bar) > (foo (bar baz)) > (foo "bar") > (foo (bar "baz") > > And these levels of data could be several levels deep w/ data. E.g.: > > (foo (bar baz) > (baz baz) > (baz (baz (baz(baz "bar"))))) > > So, I dumbed down my grammar (as can be seen below) but it > still takes longer than I have patience for ( > 10 minutes) to parse. > > Am I SOL with parsing this file use RecDescent or is > something glaringly bad w/ the below syntax? > > TIA > > --dw > > ############################################################ > # The main file has a header, and one or more object models > File : Header Model(s) > > # Define what the header is > Header: > "(" /Header[\s]v[\d]+\.[\d]+\.[\d]+\.[\d]+/ ")" > | <error: Invalid Header> > > # Define what the object model is > Model: > "(Model" > Item(s) > ")" > | <error: Invalid parse of the ObjectModel> > > Item: > "(" /\b[^\s]+\b/ /[^\(\)]*/ Item(s?) ")" # Simply two tokens > | "(" /\b[^\s]+\b/ "\"" /[^\"]*/ "\"" Item(s?) ")" > | <error> > > > # These items left in for clarity's sake. Functionally > equivalent # to Item above, but hopefully faster > OldItem: > "(" Label Data Item(s?) ")" # Simply two tokens > | "(" Label QuotedData Item(s?) ")" > | <error> > > Label: > /\b[^\s]+\b/ > > Data: > /[^\(\)]*/ > > QuotedData: > "\"" /[^\"]*/ "\"" > ############################################################ >