Tom Kinzer wrote: > > Rob Dixon wrote: > > > > Eric Sand wrote: > > > > > > I am very new to Perl, but I sense a great adventure ahead after just > > > programming with Cobol, Pascal, and C over the last umpteen years. I have > > > written a perl script where I am trying to detect a non-printing > > > character(Ctrl@ - Ctrl_) and then substitute a printing ASCII sequence > > such > > > as "^@" in its place, but it does not seem to work as I would like. Any > > > advice would be greatly appreciated. > > > > > > Thank You....Eric Sand > > > > > > > > > > Your obvious guess is to write Perl as if it were C. That's slightly better > > than treating it as a scripting language, but there are many joys left to be > > found! > > > > > $in_ctr=0; > > > $out_ctr=0; > > > > > > while ($line = <STDIN>) > > > { > > > chomp($line); > > > $in_ctr ++; > > > if ($line = s/\c@,\cA,\cB,\cC,\cD,\cE,\cF,\cG,\cH,\cI,\cJ,\cK, > > > \cL,\cM,\cN,\cO,\cP,\cQ,\cR,\cS,\cT,\cU,\cV,\cW, > > > \cX,\cY,\cZ,\c[,\c\,\c],\c^,\c_ > > > /^@,^A,^B,^C,^D,^E,^F,^G,^H,^I,^J,^K, > > > ^L,^N,^N,^O,^P,^Q,^R,^S,^T,^U,^V,^W, > > > ^X,^Y,^Z,^[,^\,^],^^,^_/) > > > { > > > $out_ctr ++; > > > printf("Non-printing chars detected in: %s\n",$line); > > > } > > > } > > > printf("Total records read = > > %d\n",$in_ctr); > > > printf("Total records written with non-printing characters = > > %d\n",$out_ctr); > > > > I would write this as below. The first things is to *always* > > > > use strict; > > use warnings; > > > > > > after which you have to declare all of your variables with 'my'. > > > > The second is to get used to using the default $_ variable which > > is set to the value for the current 'while(<>)' or 'for' loop > > iteration, and is a default parameter for most built-in functions. > > > > Finally, in your particular case you're using the s/// (substitute) > > operator wrongly. The first part, s/here//, is a regular expression, > > not a list of characters. You'll need to read up on these at > > > > perldoc perlre > > > > The second part, s//here/, is a string expression which can use > > 'captured' sequences (anything in brackets) from the first part > > and, with the addition of the s///e (executable) qualifier can > > also be an executable statement. Here I've used it to add 0x20 > > to the ASCII value of the control character grabbed by the regex. > > > > A lot of this won't make sense until you learn some more, but I > > hope you'll agree that this code is cuter than your original? > > > > HTH, > > > > Rob > > > > > > > > use strict; > > use warnings; > > > > my $in_ctr = 0; > > my $out_ctr = 0; > > > > while (<>) { > > > > chomp; > > > > $in_ctr++; > > > > if (s/([\x00-\1F])/'^'.chr(ord($1) + 0x40)/eg) { > > $out_ctr++; > > printf "Non-printing chars detected in: %s\n", $_; > > } > > } > > > > printf "Total records read = %d\n", $in_ctr; > > printf "Total records written with non-printing characters = %d\n", > > $out_ctr; > > Rob, can you explain the details of that replace? That's pretty slick. I > see you're adding the hex value to get to the appropriate ASCII value, but > didn't know you could do some of that gyration inside a regex.
I didn't think it was slick at all. In fact I was disappointed that it looked such a mess, but I don't see a better way. Anyway, the statement is s/([\x00-\1F])/'^'.chr(ord($1) + 0x40)/eg where the regex is ([\x00-\1F]) The enclosing parentheses capture the entire regex as $1 for use later in the replacement expression or even in a later statement. Within that is a character class [ .. ] which is simply all control characters. It's the first 'column' of the 7-bit 128-character ASCII set with byte values 0 through 31 or 0x00 through 0x1F. It would be better expressed as [[:cntrl:]] which is identical but describes what you /mean/ rather than how your machine should do it. OK, so we've captured one control character into $1. Then comes the replacement string, which can be an executable expression with the /e modifier on the substitution. Note that for simple interpolation of variables like the captured $1, $2 etc, and in fact any variable (including arrays and hashes) in scope, there is no need for /e. It is only necessary if there are operators or subroutines that need to be executed to build the replacement string. It's a mess because there is no way of relating control characters (e.g. CR) with their alphabetic equivalents (e.g. CTRL/M) without doing character arithmetic. And that's not what characters do in /real/ life. In '^'.chr(ord($1) + 0x40) ord($1) returns the byte value of the control character. + 0x40 moves that byte value from the first column (control characters) to the third column (capital alphas) chr() turns that byte value back into a one-character ASCII string. '^'. appends a caret before that character. Hence "\cM" becomes '^M'. All that is left is the /g modifier, which simply replaces all instances of the regex instead of just the first one found. I hope this helps. It's useful for me to tie down my programming to first principles once in a while and ask /why/ did I write that? Cheers guys. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>