On Fri, 10 Aug 2001, Michel Lambert wrote:
>
> 1) What's the fastest way to strip trailing whitespace ( as defined by
> /\s/ ) from a decent-sized string? Decent-sized means this email, or 100 of
> these emails concatenated together.
Yet Another Benchmark -- conclusions first:
* For any nontrivial text input, the "1 while s/\s\z//" approach is by
far the fastest.
* The sexeger approach comes second, with "s/\s+\z//" being slowest.
* The performance of "1 while s/\s\z//" is linear with the amount of
trailing whitespace, sexeger performance is obviously dominated by
the reversal (making it linear with string length), and "s/\s+\z//"
scales more or less linearly with the number of false starts.
* There is no measurable difference between "s/\s+\z//" and "s/\s+$//".
My original assumption was indeed that "s/\s+\z//" should be fastest, as
the optimization to make it so seems trivial. Apparently, however, the
current perl regex implementation does not make it.
The code I used for the benchmark is included below, followed by the
inputs I used for it (the sexeger entry was added later):
use strict;
use Benchmark;
use vars qw/$data/;
{ local $/; $data = <STDIN>; }
die unless length $data;
timethese shift,
{ 'null' => 'local $_ = $data;',
'sexeger'=> 'local $_ = $data; $_ = reverse $_; s/^\s+//; $_ = reverse $_',
'$' => 'local $_ = $data; s/\s+$//;',
'z' => 'local $_ = $data; s/\s+\z//;',
'rep-z' => 'local $_ = $data; 1 while s/\s\z//;',
};
__END__
And here are the inputs:
1. /usr/dict/words
2. /usr/dict/words, " "x1024
3. " " x 65536
4. "x" x 63336, " " x 65536
5. " x" x 63336, " " x 65536
6. " x" x 63336
On inputs 1, 2 and 6 the "1 while s/\s\z//" approach wins hands down.
On 3 and 4 it loses by almost two orders of magnitude, and with input
number 5 the "1 while s/\s\z//" and "s/\s+\z//" performances are equal
to within a factor of 2.
I added the sexeger entry after I had already done most of the testing,
so all I can say about it is that it doesn't beat "1 while s/\s\z//" for
the kinds of input the original message specifies. (And yes, I did test
it for shorter inputs than the ones listed above.)
--
Ilmari Karonen - http://www.sci.fi/~iltzu/
"It's possible to write a Perl program that simulates a universal Turing
machine, so, yes, your point is both valid and correct."
-- Greg Bacon in comp.lang.perl.misc