On Fri, 10 Aug 2001, Michel Lambert wrote:
> 
> 1) What's the fastest way to strip trailing whitespace ( as defined by
> /\s/ ) from a decent-sized string? Decent-sized means this email, or 100 of
> these emails concatenated together.

Yet Another Benchmark -- conclusions first:

 * For any nontrivial text input, the "1 while s/\s\z//" approach is by
   far the fastest.

 * The sexeger approach comes second, with "s/\s+\z//" being slowest.

 * The performance of "1 while s/\s\z//" is linear with the amount of
   trailing whitespace, sexeger performance is obviously dominated by
   the reversal (making it linear with string length), and "s/\s+\z//"
   scales more or less linearly with the number of false starts.

 * There is no measurable difference between "s/\s+\z//" and "s/\s+$//".


My original assumption was indeed that "s/\s+\z//" should be fastest, as
the optimization to make it so seems trivial.  Apparently, however, the
current perl regex implementation does not make it.

The code I used for the benchmark is included below, followed by the
inputs I used for it (the sexeger entry was added later):

  use strict;
  use Benchmark;

  use vars qw/$data/;
  { local $/; $data = <STDIN>; }
  die unless length $data;

  timethese shift,
    { 'null'   => 'local $_ = $data;',
      'sexeger'=> 'local $_ = $data; $_ = reverse $_; s/^\s+//; $_ = reverse $_',
      '$'      => 'local $_ = $data; s/\s+$//;',
      'z'      => 'local $_ = $data; s/\s+\z//;',
      'rep-z'  => 'local $_ = $data; 1 while s/\s\z//;',
    };

  __END__

And here are the inputs:

  1. /usr/dict/words
  2. /usr/dict/words, " "x1024
  3. " " x 65536
  4. "x" x 63336, " " x 65536
  5. " x" x 63336, " " x 65536
  6. " x" x 63336

On inputs 1, 2 and 6 the "1 while s/\s\z//" approach wins hands down.
On 3 and 4 it loses by almost two orders of magnitude, and with input
number 5 the "1 while s/\s\z//" and "s/\s+\z//" performances are equal
to within a factor of 2.

I added the sexeger entry after I had already done most of the testing,
so all I can say about it is that it doesn't beat "1 while s/\s\z//" for
the kinds of input the original message specifies.  (And yes, I did test
it for shorter inputs than the ones listed above.)

-- 
Ilmari Karonen - http://www.sci.fi/~iltzu/
"It's possible to write a Perl program that simulates a universal Turing
 machine, so, yes, your point is both valid and correct."
                                   --  Greg Bacon in comp.lang.perl.misc

Reply via email to