To clarify, you may assume that lines in string are separated by
"\n" but any solution must pass the following edge cases:
 1) empty string: @lines should contain zero elements
 2) string of "\n" : @lines should contain one empty element
 3) trailing empty lines should be retained
 4) you may not assume that string is properly newline-terminated

For cheap thrills, I benchmarked some solutions that pass all
the edge cases.

use strict;
use Benchmark;
my $x = <<'FLAMING_OSTRICHES';

This is first test line

This is 2nd
And 3rd

FLAMING_OSTRICHES
sub a1 { my @lines = split(/^/, $x, -1); chomp(@lines) }
sub a2 { my @lines = $x eq "" ? () : $x =~ /^.*/mg }
sub j1 { my @lines = map { chomp; $_ } split /^/, $x, -1 }
# w1 is Perl 5.8.0 only
sub w1 { open(my $fh, "<", \$x); my @lines = <$fh>; chomp(@lines) }
timethese(600000, {
   'a1'     => \&a1,
   'a2'     => \&a2,
   'j1'     => \&j1,
   'w1'     => \&w1,
});

Results on Linux, Perl 5.8.0:
 a1: 27 wallclock secs (15.06 usr +  0.01 sys = 15.07 CPU)
 a2: 42 wallclock secs (24.06 usr +  0.04 sys = 24.10 CPU)
 j1: 49 wallclock secs (27.84 usr +  0.04 sys = 27.88 CPU)
 w1: 101 wallclock secs (62.74 usr +  0.04 sys = 62.78 CPU)

Why is a1 fastest? Not sure, but I noticed in the Camel re split:
"the patterns /\s+/, /^/ and / / are specially optimized".

BTW, an interesting technique, described at:
 http://www.ccl4.org/~nick/P/Fast_Enough/
is to examine the ops. For example:
 perl -MO=Terse -e'my @lines = split(/^/, $x, -1); chomp(@lines)'
 perl -MO=Terse -e'my @lines = map { chomp; $_ } split /^/, $x, -1;'

/-\



http://greetings.yahoo.com.au - Yahoo! Greetings
- Send some online love this Valentine's Day.

Reply via email to