[EMAIL PROTECTED] wrote on 03/13/2006 03:54:10 PM:
> In a message dated 3/13/2006 9:43:44 A.M. Eastern Standard Time,
> [EMAIL PROTECTED] writes:
>
> > [EMAIL PROTECTED] wrote on 03/11/2006
03:05:10
> > PM:
> > > Today's Topics:
> > > 4. Removing the blank spaces (Naresh Bajaj)
> > >
----------------------------------------------------------------------
> > > ------------------------------
> > > Message: 4
> > > Date: Sat, 11 Mar 2006 13:49:20 -0600
> > > From: "Naresh Bajaj" <[EMAIL PROTECTED]>
> > > Subject: Removing the blank spaces
> > > To: [email protected]
> > > Message-ID:
> > > <[EMAIL PROTECTED]>
> > > Content-Type: text/plain; charset="iso-8859-1"
> > >
> > > Hello,
> > > This is my problem. I have extracted one variable value from a file
and
> > > saved it another fie.
> > > Problem is that it has too many spaces as shown in this example. I
want
> > to
> > > remove those blank spaces.
> > > If I use split, / / $fti, I am getting partial results as shown
below.
> > > Please let me know how can I remove those spaces. I appreciated.
> > <examples removed>
> > split creates an array on a boundary. i think that while it could be
used
> > for what you want, it would be so in a round-about way. there are more
> > direct methods. I suspect your dislike of the result is a product of
> > not-enough-understanding (similar to the too little information is
worse
> > than none that creates panic among plebs) of split. instead of trying
to
> > deduce your code i'll give you a regexp that should give the desired
> > results
> > [code]
> > #! /usr/bin/perl
> > $input=" example information in a string ";
> > $input =~ /\s+/ /g;
>
>
> should be s/\s+/ /g; note initial s in s/// substitution.
i used that religiously like i used the preceding m on m// religiously
until a few months back. now i only use it if it's not a personal thing
and someone else will need to upkeep.
they aren't needed but are great for upkeep.
>
> > print "$input\n";
> > $input =~ /^\s*(\S.*)\s*$/$1/;
>
>
> should be s/^\s*(\S.*)\s*$/$1/;
>
> > print "$input\n";
> > [/code]
> > should print:
> > example information in a string
> > example information in a string
> > note that the first one has an extra " " at the end.
>
>
> actually, both will have an extra space if there was any trailing
> whitespace at all.
>
> > it could also have
> > more \n than intended.
>
>
> the first substitution s/\s+/ /g; will remove any and all \n.
>
> > chomp removes that.
>
>
> but there is no chomp().
>
> > i'm not sure, and don't believe
> > it would remove leading white space. to remove that, i used the second
> > substitute instead.
> >
> > >
> > > I hope I clearly explained the problem. Please let me know if you
are
> > not
> > > clear about my issue. Thanks,
> >
> > you did, and the potential of the confusion over what split does is
why
> > i'm now going to add a little explanation of what the regular
expressions
> > are doing. in the hopes that i'll help teach you them. =o)
> >
> > /\s+/ /g
>
> again, should be s/\s+/ /g
>
> >
> > uses perl short \s which is [ \n\r\t] and one other thing also
> > "whitespace." + means "1 or more" so it'll find the first run of white
> > space and replace it with the next part, a single " ". the g makes
this
> > global, so it is don through out the whole set of data, hitting all
the
> > occurrences.
> >
> > /^\s*(\S.*)\s*$/$1/
>
> again, should be s/^\s*(\S.*)\s*$/$1/
>
> >
> > again uses the \s short and also uses the \S short. the \S means [^\s]
and
> > the . is anything, the * means 0 or more. the $ at the end is an
> > end-of-line anchor and the ^ at the beginning is a beginning-of-line
> > anchor
> > this finds all the white space until the first non-whitespace, then
all
> > the white space at the end. it then replaces the entire line with $!
>
> typo: $! should be $1
thank you for correcting that
>
> > which
> > is the capture from (\S.*) which is everything that is not the
beginning
> > and ending white space.
>
>
> the problem here is that in the capturing parenthetic expression
> (\S.*) the .*
> has a ``greedy'' * (zero or more) quantifier. this will consume
> everything to the
> end of the string (or to the first \n, but there are no newlines any
> more due to the
> action of the first substitution) and include those characters in
> the $1 capture variable.
> if there was a space at the end of the string, it will be a part of $1.
> the \s* just after the capturing parenthesis also has a greedy
quantifier, but
> in a situation like this, the first greedy quantifier to the table
> wins the day: \s* can
> be satisfied with zero whitespace (although it would like more), so
> the regex as a
> whole is satisfied.
>
> to fix this problem, make the * quantifier in (\S.*) ``lazy''
> with a ? modifier:
> i.e., (\S.*?). this allows it to ``back off'' and let the \s*
> gobble as much
> whitespace as it wants. see code examples below.
>
> >
> > > --
> > > Naresh Bajaj, Intern,
> > > Cardiac Rhythm Disease Management,
> > > Medtronic Inc.,
> > > 763-514-3799
> >
> >
> > HTH
> > Josh Perlmutter
>
> greedy * in (\S.*) can leave a space at end of the string:
>
> [code]
> $input = qq( example \n information in \n\n a string );
> $input =~ s/\s+/ /g;
> print qq({$input}\n);
> $input =~ s/^\s*(\S.*)\s*$/$1/;
> print qq({$input}\n);
> [output]
> { example information in a string }
> {example information in a string }
>
> lazy * in (\S.*?) leaves no space:
>
> [code]
> $input = qq( example \n information in \n\n a string );
> $input =~ s/\s+/ /g;
> print qq({$input}\n);
> $input =~ s/^\s*(\S.*?)\s*$/$1/;
> print qq({$input}\n);
> [output]
> { example information in a string }
> {example information in a string}
>
> (also, the \S in the parenthetic expressions is redundant.)
not sure. i do that to ensure it goes until there is a non-space. remember
greedy....
>
> hth -- bill walters
thanks for the catch on the greedy v non-greedy
-Josh
-----------------------------------------
PLEASE NOTE:
SeaChange International headquarters in Maynard, MA is moving!
Effective March 1, 2006, our new headquarters address will be:
SeaChange International
50 Nagog Park
Acton, MA 01720 USA
All telephone numbers remain the same:
Main Corporate Telephone: 978-897-0100
Customer Service Telephone: 978-897-7300
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs