Thanks Kim, for your help. I did used the same way (using the regex through java 1.4) but I found it to be a lot more time taking. So I thought it would be fast in perl (which it is now). Thanks a lot for your help.
Thanks&Regards Debi -----Original Message----- From: Kim H. Young [mailto:khyoung@;civiltecheng.com] Sent: Thursday, October 31, 2002 3:46 PM To: Mohanty, Debi (MED, TCS); [EMAIL PROTECTED] Subject: RE: deleting control Characters...... Please help Debi, I tried the solutions you've been presented with and wasn't satisfied I understood what the code was doing. For example, I've never had to iterate over @ARGV and had never used the "magic open" (<>) construct. So... I did some research, tried a bunch of new things (new to me, anyway) and learned a bunch of stuff in the process. Here's the code I came up with: <CODE> # strip non-ASCII characters from each file listed on the command line # USAGE: strip.pl file.txt file.doc file.xls file.zip file.gif ... # use strict; use warnings; my $i; my $file; my $fout; my $fhin; my $fhout; for ($i=0; $ARGV[$i]; $i++) { $file = $ARGV[$i]; $fout = "$file.stripped"; print "stripping >" . $file . "< to >" . $fout . "<\n"; open ($fhin, "<$file") or die "Trouble opening $file: $!\n"; open ($fhout, ">$fout") or die "Trouble opening $fout: $!\n"; binmode ($fhin); binmode ($fhout); while (<$fhin>) { # space, tab, newline, return, alphanumeric, punctuation s/[^ \t\n\r\w~!@#$%^&*()_+`\-=[\]\\{}|;':",.\/<>?]*//g; print $fhout $_; } close($fhin) or die "Trouble closing $file: $!\n"; close($fhout) or die "Trouble closing $fout: $!\n"; } </CODE> Please note that there is a <SPACE> character before the "\t" identifier. Originally, I used \s in my regex to identify whitespace characters (same as [ \t\n\r\f]) but I didn't like getting the form feed characters. I also found I needed to open each input file in binary mode to get to the "good stuff" in Word and Excel files (and GIFs and...). Since you've got \0 characters in your files, you'll probably need binmode, too. I am SURE there are better ways to do this, but for now, I'd best get back to work. :-) I never did figure out the implicit ARGV thing... Cheers, Kim -----Original Message----- From: Mohanty, Debi (MED, TCS) [mailto:Debi.Mohanty@;med.ge.com] Sent: Thursday, October 31, 2002 11:29 AM To: [EMAIL PROTECTED] Subject: deleting control Characters...... Please help Hi, I have a data file (which I get from the Mainframe), which contains some sort of control characters. Is there any way where by I can clean the file with perl. When I tried to copy those characters on the file it shows me a message saying "cannot cut, copy or drag and drop text containing null (code=0) characters". Is there any way I can clean these characters from the file by using perl. Thanks&Regards Debi _______________________________________________ ActivePerl mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs _______________________________________________ ActivePerl mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
