I wrote Parse::Gnaw to parse a file a little bit at a time using a real
grammer. I was dealing with multi gigabyte text files that represented a
gate level version of an ASIC.
As long as the text you are parsing can be defined in chunks and you can
flush at the end of a chunk then Parse::Gnaw ought to parse your text. If
you just want to capture some identifiers, it supports the equivalent of
capturing parens.
as for speed, its about three or four times faster than Parse::RecDescent.
Greg London
-----Original message-----
From: Kripa Sundar <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Sat, Feb 5, 2011 00:53:35 GMT+00:00
Subject: Re: [Boston.pm] Q: giant-but-simple regex efficiency
Thanks for the prompt replies, folks!
Unfortunately, my names can be embedded in larger "words" of the input
text, as long as they are delimited by certain punctuation.
If I can figure out all of the permitted punctuation, I will try out
a split() and a hash lookup.
But Regex::Trie seems more likely to help. (I guess I was hoping that
the Perl regex compiler would automatically do that kind of optimal
regex construction, without the need for a module.)
peace, || Finding gifts that do not harm:
--{kr.pA} || http://www.dailygood.org/more.php?n=3159
--
It might look like I'm idle, but at the cellular level I'm really quite
busy.
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm