|
In a message dated 3/6/2006 6:53:29 P.M. Eastern Standard Time,
[EMAIL PROTECTED] writes:
> In a message dated 3/6/2006 5:37:39 P.M. Eastern
Standard Time, [EMAIL PROTECTED]
> writes:
> > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] > > Sent: Monday, March 06, 2006 2:07 PM > > Subject: regexp vs substr for string capture? > > >Wizards, > > >I thought regexp was the fastest way to pull strings out of another string; > > certainly faster >than substr(). It seems I was wrong, as the following code > > >showed > > > > I couldn't tell you this for sure but my common sense response is that a > > regex has to do lots of comparing and the more complex it is the more > > comparing it has to do. Your call to substr has two hard coded fixed values > > (19 and -5) and requires no comparing. I'm not surprised to find that's so > > much faster. What you probably heard was that the entire process of > > figuring out where to get the substring would take longer than a regex. > > Especially for variable data! The more variance it has the better a general > > purpose regex will perform vs. some logic you wrote and a call to substr. > > If you have fixed width *by all means* use substr! > > > > Most the time a regex is used it is with variable data lengths and/or > > unknown input values. Try this one again with 100 lines of xml with > > different data on each line. The logic required to use a substr to pull out > > the tag on each line would be massive. > > > > I could be off base but that's my common sense understand if what's going on > > here, in your example. > > > > -Wayne > > deane -- > > just for grins, try the following regex vs. substr(). i'm sure it will still be slower > than substr(), but i'll bet the difference will be a lot less. ...and i'd lose that bet! see below.
> my $fixed_len_regex = qr/ ^ .{19} (.*) .{5} $ /x; > my $data_line = ' <TD class=xc01>Wanted data goes here</TD>'; > my $tag; > > # etc... > > 'fixed length regex' => sub { ($tag) = $data_line =~ $fixed_len_regex }, > > # and so on... > > regards -- bill walters deane --
after doing some actual benchmarking, it turns out the ``fixed length
regex'' is slower than
the ``new way'' regex, and in both cases the qr// object form is (slightly)
slower still. my
guess is this is due to considerable optimization effort having been
expended
on ``new way''-type patterns.
even when index() and rindex() are used to do limited pattern search (see
code below) to
flexibly determine substr() start offsets and lengths, the substr()
approach is still faster
than a regex in this limited case.
however, i would certainly agree very strongly that a regex approach is
likely to be more
successful (e.g., less breakable) in the long run.
best regards -- bill walters
#
-------------------------------------------------------------------------------------------
'vari old way' => sub { # substr(),
quasi-pattern match
$start = index ($data_line, '>') + 1; $len = rindex($data_line, '<') - $start; $tag9 = substr( $data_line, $start, $len ); }, |
_______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
