I've tried this wonderful command: hexadump -c file.txt and I found that I have to include more and more chars to erase as follows:
$text=~s/\177//g; $text=~s/\377//g; $text=~s/\335//g; $text=~s/\360//g; $text=~s/\204//g; $text=~s/\222/\n/g; $text=~s/\214//g; $text=~s/\216//g; $text=~s/\224//g; $text=~s/\240//g; $text=~s/\237//g; $text=~s/\234//g; $text=~s/\325//g; $text=~s/\351//g; $text=~s/\352//g; $text=~s/\355//g; $text=~s/\361//g; $text=~s/\362//g; $text=~s/\366//g; Is there a way to erase all the chars that are higher than, say, 300? Does this make sense? Thank you again! On Oct 3, 2008, at 7:33 AM, Brian Raven wrote: > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Alejandro Santillan Iturres > Sent: 02 October 2008 19:45 > To: activeperl@listserv.ActiveState.com > Cc: [EMAIL PROTECTED] > Subject: Re: regexp to "clean" a text file > >> Thank you William, Bill and Tim. Finally s/[\x00-\x1f]//g did the > trick, almost perfect. >> The original file is the palm database of memo pads. The text is > there, plain. Several mixed control characters > were present. >> The system I working on is a Fedora linux box. I have no hex utility > installed to make de dump, so I don't know > if the ^E is really a ^E. > > I find that a little hard to believe. Try 'hexdump', or if that isn't > present you should at least have 'od'. If neither of them are > installed, > you Linux installation sounds a bit broken. Unless you can identify > which characters are to be kept or discarded, you will find it > difficult > to 'clean' your data effectively. > > HTH > > -- > Brian Raven > > ----------------------------------------------------------------------------------------------------------- > This e-mail may contain confidential and/or privileged information. > If you are not the intended recipient or have received this e-mail > in error, please advise the sender immediately by reply e-mail and > delete this message and any attachments without retaining a copy. > Any unauthorised copying, disclosure or distribution of the material > in this e-mail is strictly forbidden. > > > _______________________________________________ > ActivePerl mailing list > ActivePerl@listserv.ActiveState.com > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs _______________________________________________ ActivePerl mailing list ActivePerl@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs