Hi -
I am trying to come up with a simple, elegant word parsing script, that:
* takes a scalar string, and
* splits it into words separating on white space, commas,
and a set of delimiters: "" '' // () {} [] ##, and
* returns the array of words.
So far I have:
# ----------------------------------------------------------------
print( '-', join( '-&-', parse_words( $_ ) ), "-\n" ) for( @ARGV );
sub parse_words
{
my $line = shift;
my @words = ();
$_ = $line;
while( 1 ) {
s/^\s*(.*?)\s*$/$1/;
last unless length $_;
pos( $_ ) = 0;
if( /^"(.*?)"/g || /^'(.*?)'/g ||
/^\/(.*?)\//g || /^\((.*?)\)/g ||
/^{(.*?)}/g || /^\[(.*?)\]/g ||
/^<(.*?)>/g || /^#(.*?)#/g
) {
push @words, $1;
$_ = substr $_, pos( $_ );
next;
}
if( /^(.*?),/g ) {
push @words, $1;
$_ = substr $_, pos( $_ );
next;
}
if( /^(.*?)\s+/g ) {
push @words, $1;
$_ = substr $_, pos( $_ );
next;
}
push( @words, $_ ) if length $_;
last;
}
@words;
}
# ----------------------------------------------------------------
A test gives the correct results:
perl t.pl "\"mother's apple pie\" <randy 'lewis'>apple, corn dog,0 1 2"
-mother's apple pie-&-randy 'lewis'-&-apple-&-corn dog-&-0-&-1-&-2-
Now this is fine, and I can use it as is, but, I seems a bit pedestrian
and heavy-handed. I tried, and failed, to write one using a super-all-
in-one regex in a progressive matching /g while loop.
Does anyone want to help me find elegance?
Aloha => Beau;
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>