Re: Removing the blank spaces

Josh . Perlmutter Mon, 13 Mar 2006 13:17:31 -0800

[EMAIL PROTECTED] wrote on 03/13/2006 03:54:10 PM:

> In a message dated 3/13/2006 9:43:44 A.M. Eastern Standard Time, 
> [EMAIL PROTECTED] writes:
> 
> > [EMAIL PROTECTED] wrote on 03/11/2006 
03:05:10 
> > PM:
> > > Today's Topics:
> > >    4. Removing the blank spaces (Naresh Bajaj)
> > > 
----------------------------------------------------------------------
> > > ------------------------------
> > > Message: 4
> > > Date: Sat, 11 Mar 2006 13:49:20 -0600
> > > From: "Naresh Bajaj" <[EMAIL PROTECTED]>
> > > Subject: Removing the blank spaces
> > > To: [email protected]
> > > Message-ID:
> > >    <[EMAIL PROTECTED]>
> > > Content-Type: text/plain; charset="iso-8859-1"
> > > 
> > > Hello,
> > > This is my problem. I have extracted one variable value from a file 
and
> > > saved it another fie.
> > > Problem is that it has too many spaces as shown in this example. I 
want 
> > to
> > > remove those blank spaces.
> > > If I use split, /  / $fti, I am getting partial results as shown 
below.
> > > Please let me know how can I remove those spaces.  I appreciated.
> > <examples removed>
> > split creates an array on a boundary. i think that while it could be 
used 
> > for what you want, it would be so in a round-about way. there are more


> > direct methods. I suspect your dislike of the result is a product of 
> > not-enough-understanding (similar to the too little information is 
worse 
> > than none that creates panic among plebs) of split. instead of trying 
to 
> > deduce your code i'll give you a regexp that should give the desired 
> > results
> > [code]
> > #! /usr/bin/perl
> > $input="  example    information in     a string   ";
> > $input =~ /\s+/ /g;
> 
> 
> should be s/\s+/ /g;   note initial s in s/// substitution. 

i used that religiously like i used the preceding m on m// religiously 
until a few months back. now i only use it if it's not a personal thing 
and someone else will need to upkeep.
they aren't needed but are great for upkeep.

> 
> > print "$input\n";
> > $input =~ /^\s*(\S.*)\s*$/$1/;
> 
> 
> should be s/^\s*(\S.*)\s*$/$1/;
> 
> > print "$input\n";
> > [/code]
> > should print:
> > example information in a string 
> > example information in a string
> > note that the first one has an extra " " at the end. 
> 
> 
> actually, both will have an extra space if there was any trailing 
> whitespace at all. 
> 
> > it could also have 
> > more \n than intended. 
> 
> 
> the first substitution   s/\s+/ /g;   will remove any and all \n. 
> 
> > chomp removes that. 
> 
> 
> but there is no chomp(). 
> 
> > i'm not sure, and don't believe 
> > it would remove leading white space. to remove that, i used the second 

> > substitute instead. 
> > 
> > > 
> > > I hope I clearly explained the problem. Please let me know if you 
are 
> > not
> > > clear about my issue. Thanks,
> > 
> > you did, and the potential of the confusion over what split does is 
why 
> > i'm now going to add a little explanation of what the regular 
expressions 
> > are doing. in the hopes that i'll help teach you them. =o)
> > 
> > /\s+/ /g
> 
> again, should be s/\s+/ /g
> 
> > 
> > uses perl short \s which is [ \n\r\t] and one other thing also 
> > "whitespace." + means "1 or more" so it'll find the first run of white 

> > space and replace it with the next part, a single " ". the g makes 
this 
> > global, so it is don through out the whole set of data, hitting all 
the 
> > occurrences.
> > 
> > /^\s*(\S.*)\s*$/$1/
> 
> again, should be s/^\s*(\S.*)\s*$/$1/
> 
> > 
> > again uses the \s short and also uses the \S short. the \S means [^\s] 
and 
> > the . is anything, the * means 0 or more. the $ at the end is an 
> > end-of-line anchor and the ^ at the beginning is a beginning-of-line 
> > anchor
> > this finds all the white space until the first non-whitespace, then 
all 
> > the white space at the end. it then replaces the entire line with $! 
> 
> typo:  $! should be $1
thank you for correcting that
> 
> > which 
> > is the capture from (\S.*) which is everything that is not the 
beginning 
> > and ending white space.
> 
> 
> the problem here is that in the capturing parenthetic expression 
> (\S.*)  the .*
> has a ``greedy''  *  (zero or more) quantifier.   this will consume 
> everything to the 
> end of the string (or to the first \n, but there are no newlines any
> more due to the 
> action  of the first substitution) and include those characters in 
> the $1 capture variable. 
> if there was a space at the end of the string, it will be a part of $1.  

> the \s* just after the capturing parenthesis also has a greedy 
quantifier, but
> in a situation like this, the first greedy quantifier to the table 
> wins the day:  \s*  can
> be satisfied with zero whitespace (although it would like more), so 
> the regex as a 
> whole is satisfied.
> 
> to fix this problem, make the  *  quantifier in  (\S.*)  ``lazy'' 
> with a  ?  modifier:
> i.e.,  (\S.*?).   this allows it to ``back off'' and let the  \s* 
> gobble as much 
> whitespace as it wants.   see code examples below. 
> 
> > 
> > > --
> > > Naresh Bajaj, Intern,
> > > Cardiac Rhythm Disease Management,
> > > Medtronic Inc.,
> > > 763-514-3799
> > 
> > 
> > HTH
> > Josh Perlmutter
> 
> greedy  *  in  (\S.*)  can leave a space at end of the string: 
> 
> [code]
> $input = qq(  example   \n information in   \n\n  a string  );
> $input =~ s/\s+/ /g;
> print qq({$input}\n);
> $input =~ s/^\s*(\S.*)\s*$/$1/;
> print qq({$input}\n);
> [output]
> { example information in a string }
> {example information in a string }
> 
> lazy  *  in  (\S.*?)  leaves no space: 
> 
> [code]
> $input = qq(  example   \n information in   \n\n  a string  );
> $input =~ s/\s+/ /g;
> print qq({$input}\n);
> $input =~ s/^\s*(\S.*?)\s*$/$1/;
> print qq({$input}\n);
> [output]
> { example information in a string }
> {example information in a string}
> 
> (also, the  \S  in the parenthetic expressions is redundant.)
not sure. i do that to ensure it goes until there is a non-space. remember 
greedy....
> 
> hth -- bill walters 
thanks for the catch on the greedy v non-greedy

-Josh

-----------------------------------------
PLEASE NOTE: 
SeaChange International headquarters in Maynard, MA is moving!
Effective March 1, 2006, our new headquarters address will be:

SeaChange International 
50 Nagog Park 
Acton, MA 01720 USA 

All telephone numbers remain the same: 
Main Corporate Telephone: 978-897-0100 
Customer Service Telephone: 978-897-7300

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Removing the blank spaces

Reply via email to