Thanks Kim, for your help.
I did used the same way (using the regex through java 1.4) but I found
it to be a lot more time taking.
So I thought it would be fast in perl (which it is now).
Thanks a lot for your help.

Thanks&Regards
Debi

-----Original Message-----
From: Kim H. Young [mailto:khyoung@;civiltecheng.com]
Sent: Thursday, October 31, 2002 3:46 PM
To: Mohanty, Debi (MED, TCS); [EMAIL PROTECTED]
Subject: RE: deleting control Characters...... Please help


Debi,

I tried the solutions you've been presented with and wasn't satisfied I
understood what the code was doing. For example, I've never had to
iterate
over @ARGV and had never used the "magic open" (<>) construct. So... I
did
some research, tried a bunch of new things (new to me, anyway) and
learned a
bunch of stuff in the process.

Here's the code I came up with:

<CODE>
# strip non-ASCII characters from each file listed on the command line
# USAGE: strip.pl file.txt file.doc file.xls file.zip file.gif ...
#
use strict;
use warnings;

my $i;
my $file;
my $fout;
my $fhin;
my $fhout;

for ($i=0; $ARGV[$i]; $i++) {
    $file = $ARGV[$i];
    $fout = "$file.stripped";
    print "stripping >" . $file . "< to >" . $fout . "<\n";
    open ($fhin,  "<$file") or die "Trouble opening $file: $!\n";
    open ($fhout, ">$fout") or die "Trouble opening $fout: $!\n";
    binmode ($fhin);
    binmode ($fhout);
    while (<$fhin>) {
        # space, tab, newline, return, alphanumeric, punctuation
        s/[^ \t\n\r\w~!@#$%^&*()_+`\-=[\]\\{}|;':",.\/<>?]*//g;
        print $fhout $_;
    }
    close($fhin)  or die "Trouble closing $file: $!\n";
    close($fhout) or die "Trouble closing $fout: $!\n";
}
</CODE>

Please note that there is a <SPACE> character before the "\t"
identifier.

Originally, I used \s in my regex to identify whitespace characters
(same as
[ \t\n\r\f]) but I didn't like getting the form feed characters. I also
found I needed to open each input file in binary mode to get to the
"good
stuff" in Word and Excel files (and GIFs and...). Since you've got \0
characters in your files, you'll probably need binmode, too.

I am SURE there are better ways to do this, but for now, I'd best get
back
to work. :-) I never did figure out the implicit ARGV thing...

Cheers,
Kim

-----Original Message-----
From: Mohanty, Debi (MED, TCS) [mailto:Debi.Mohanty@;med.ge.com]
Sent: Thursday, October 31, 2002 11:29 AM
To: [EMAIL PROTECTED]
Subject: deleting control Characters...... Please help


Hi,
        
        I have a data file (which I get from the Mainframe), which
contains some sort of control characters. Is there any way where by I
can clean the file with perl. When I tried to copy those characters on
the file it shows me a message saying 
"cannot cut, copy or drag and drop text containing null (code=0)
characters". Is there any way I can clean these characters from the file
by using perl.

Thanks&Regards
Debi
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to