Re: regexp vs substr for string capture?

Williamawalters Tue, 07 Mar 2006 02:36:55 -0800

In a message dated 3/6/2006 6:53:29 P.M. Eastern Standard Time, [EMAIL PROTECTED] writes:

> In a message dated 3/6/2006 5:37:39 P.M. Eastern Standard Time, [EMAIL PROTECTED]

> writes:
>
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED]
> > Sent: Monday, March 06, 2006 2:07 PM
> > Subject: regexp vs substr for string capture?
> > >Wizards,
> > >I thought regexp was the fastest way to pull strings out of another string;
> > certainly faster >than substr(). It seems I was wrong, as the following code
> > >showed
> >
> > I couldn't tell you this for sure but my common sense response is that a
> > regex has to do lots of comparing and the more complex it is the more
> > comparing it has to do. Your call to substr has two hard coded fixed values
> > (19 and -5) and requires no comparing. I'm not surprised to find that's so
> > much faster.   What you probably heard was that the entire process of
> > figuring out where to get the substring would take longer than a regex.
> > Especially for variable data! The more variance it has the better a general
> > purpose regex will perform vs. some logic you wrote and a call to substr.
> > If you have fixed width *by all means* use substr!
> >
> > Most the time a regex is used it is with variable data lengths and/or
> > unknown input values. Try this one again with 100 lines of xml with
> > different data on each line. The logic required to use a substr to pull out
> > the tag on each line would be massive.
> >
> > I could be off base but that's my common sense understand if what's going on
> > here, in your example.
> >
> > -Wayne
>
> deane --
>
> just for grins, try the following regex vs. substr().   i'm sure it will still be slower
> than substr(), but i'll bet the difference will be a lot less.

...and i'd lose that bet! see below.

> my $fixed_len_regex = qr/ ^ .{19} (.*) .{5} $ /x;
> my $data_line = '    <TD class=xc01>Wanted data goes here</TD>';
> my $tag;
>
> # etc...
>
>     'fixed length regex' => sub { ($tag) = $data_line =~ $fixed_len_regex },
>
> # and so on...
>
> regards -- bill walters

deane --

after doing some actual benchmarking, it turns out the ``fixed length regex'' is slower than

the ``new way'' regex, and in both cases the qr// object form is (slightly) slower still. my

guess is this is due to considerable optimization effort having been expended

on ``new way''-type patterns.

even when index() and rindex() are used to do limited pattern search (see code below) to

flexibly determine substr() start offsets and lengths, the substr() approach is still faster

than a regex in this limited case.

however, i would certainly agree very strongly that a regex approach is likely to be more

successful (e.g., less breakable) in the long run.

best regards -- bill walters

# -------------------------------------------------------------------------------------------

    'vari old way' => sub { # substr(), quasi-pattern match
       $start = index ($data_line, '>') + 1;
       $len   = rindex($data_line, '<') - $start;
       $tag9 = substr( $data_line, $start, $len );
       },

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: regexp vs substr for string capture?

Reply via email to