If you found the right character class, you can do a

while(<INFILE>){
  $_ = tr/[^characterclass]//g;
  print OUTFILE $_;
}

putting a ^ at the beginning of a character class matches if the character
is NOT one of those in the brackets.  Otherwise I think there is a
predefined [:printable] or something along those lines if you just want to
get rid of non-printable characters.

-----Original Message-----
From: Tim Booher [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 26, 2002 6:44 AM
To: 'Timothy Johnson'
Cc: [EMAIL PROTECTED]
Subject: RE: use perl to trim out non text characters from a file


I don't know if they are truly "valid, printable characters". When a text
file show this type of information, isn't ascii just approximating some
binary data?

Why I think this is if I open with notepad I get a file that looks like the:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ described earlier but if I do a simple:

Perl -ne "print" test.msg I get only a single:
╨╧◄αí¦

(btw this is the same output I get when I do a: c:\type test.msg)

Even if I wanted to define a character class like [a-zA-Z] how would just
extract these characters?

Thanks,

Tim

-----Original Message-----
From: Timothy Johnson [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, September 26, 2002 8:36 AM
To: 'Tim Booher'; [EMAIL PROTECTED]
Subject: RE: use perl to trim out non text characters from a file


Someone out there may have a better answer, but this one seems tougher than
average because the character you're seeing is a valid, printable text
character.  I suppose one way to go would be to create a character class
with all of the characters that you want to allow.  Something like the
following:

[a-zA-Z0-9_-=()\[\]\\\/'";:><?.,!@#$%\^&*+]    (You might have to fiddle
with the escapes a little)

and then you could do a tr/// or s/// to filter them out.

-----Original Message-----
From: Tim Booher [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 26, 2002 5:37 AM
To: [EMAIL PROTECTED]
Subject: use perl to trim out non text characters from a file


Hello – this should be really simple to the learned perl programmer, but
I am trying to create a simple script to trim all the ‘junk’ out of my
email files. I get a lot of suspicious emails in outlook and normally
drag them to the desktop and open with notepad. This works, but most of
the message is the following:
Ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
 
Does anyone know what regex could I use to only select all text and trim
all the ‘junk’ from the msg file? Also, does there exist a script
already to do this that you know of?
 
Thanks,
 
Tim
 
 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to