Sorry, replying to myself, but I just stumbled across a similar situation and my solution might help you too.
I needed to define a block like this: perl until FLAG PERL FLAG; which is like a 'here-doc' for inlining perl in another language that doesn't require actually parsing the perl code. Like your input, I need to match 'anything' up to the closing flag. I ended up using a rule similar to your original solution, except instead of having a /.*?/ match, I combined that with the next terminal. After playing around a bit, I came up with the following test script that parses out all valid chunks between 'START' and 'END' amongst other rubbish in the input in one pass: ====== START CODE ====== use Parse::RecDescent; use Data::Dumper; #$::RD_TRACE = 1; # assuming start/end delimiters of START and END my $grammar = <<'STOP'; start: chunk(s?) chunk: /.*?START/s command(s) 'END' # This is the important bit {$item[2]} command: 'test' ';' {"TEST COMMAND"} STOP my $text = << 'STOP'; blah blah blayh asdsd kjkl START test; test; END kjsaljdlk askd START test; END sad asdgfdsf gfsfg STOP my $res = Parse::RecDescent->new($grammar)->start($text); print Data::Dumper::Dumper($res), "\n"; ====== END CODE ====== Note that the /s modifier on the 'garbage scooping' re's is important for this to work. Was scratching my head over that for a bit :) The output of that is: ====== START OUTPUT ====== $VAR1 = [ [ 'TEST COMMAND', 'TEST COMMAND' ], [ 'TEST COMMAND' ] ]; ====== END OUTPUT ====== I haven't done any benchmarking, but that might be faster than sequential parses of 'clean' data. My original solution anchored to the end of the input with an eof marker and a 'trailing_guff' rule that matched anything after the chunk(s?) subrule, but that is unnecessary. MB 2009/9/4 Matthew Braid <mattyb...@gmail.com>: > Hi all, > > Would there be some way of manipulating the skip re to do this? > > Something along the lines of: > > top: <skip: /NOT START DELIMETER/> chunk(s) eof > chunk: delimeter_start <skip: /NORMAL SKIP/> command(s) delimiter_end > eof: /\Z/ > > The problem there is defining a skip that won't skip a > delimeter_start. This probably won't allow delimeter_start to _not_ > mean the start of a set of commands as well. > > Not tested, but just a suggestion. > > MB > > 2009/9/4 Mike Diehl <mdi...@diehlnet.com>: >> On Thursday 03 September 2009 01:50:58 Damian Conway wrote: >>> Hi Mike, >>> >>> > What I've tried amounts to this: >>> > >>> > chunk: /.*?/ delimiter_start command(s) delimiter_end /.*?/ >>> >>> Unfortunately that won't work, because every regex in a PRD grammar is >>> independent of the rest of the grammar, so even a minimal-matching .*? >>> eats everything. >> >> Ya, that's what I was suspecting. In hind sight, I should have figured that; >> that's how I'd write it... >> >>> Is there some reason you can't use something like: >>> >>> my $parser = Parse::RecDescent->new($grammar); >>> >>> $text =~ s{<DELIMITER> (.*?) </DELIMITER>} >>> { $parser->parse($1); q{} }gexs; >> >> That's what I was doing, but it seems I misinterpreted my profiling results. >> I found from profiling that the function I use to create (once) and run the >> parser accounted for 80% of runtime. >> >> I assumed that since I only create the parser once (if !defined), creating >> the >> parser wasn't where the cost was. So I decided that it must be due to >> actually running the parser, which might run several times during program >> execution. My conclusion was that I needed to rewrite the grammar so that >> the parser would only run once. >> >> It sounds like I may need to go back to the old algorithm and start tuning >> the >> grammar. >> >> -- >> >> Take care and have fun, >> Mike Diehl. >> >