hi anonymous -- the first thing to do is to be very clear about exactly what the characters are that you are trying to eliminate -- and those you are trying to keep!. you do not say what character set you are dealing with -- ascii, utf8, utf16, etc., etc. it would be nice to know this also. it would also be nice to the operating system and perl version you are working with. one way to find out about actual characters is to use a hex dump utility of some kind. is what displays in my e-mail as ``^E'' (carat-E) really a carat character followed by an upper-case E character, or is it a control-E (ascii 0x05 ``ENQ'')? likewise, is ^@ (carat-@) a control-@ (ascii 0x00 ``NUL'') character? what about all the whitespace that surrounds these characters in my e-mail: is that really there? another important step is to familiarize yourself with regex format -- perlre, perlretut and perlrequick are important here. one quick point is that the regex expression s/![a-zA-Z][0-9]//g does not negate the character classes that follow it: the ``!'' character is not special in a regex, it is literally a ``!'', an exclamation mark. you might want something like s/[^a-zA-Z0-9]//g instead -- however, this will also delete the accented characters you say you want to keep. if you just want to eliminate ascii control characters, the regex s/[\x00-\x1f]//g would, i think, do the trick. try something like perl -i.bak -lpe "s/[\x00-\x1f]//g" input.file on a COPY (and in a separate directory) of the file you are trying to fix. (i am assuming you are running windows.) hth -- bill walters
**************Looking for simple solutions to your real-life financial challenges? Check out WalletPop for the latest news and information, tips and calculators. (http://www.walletpop.com/?NCID=emlcntuswall00000001)
_______________________________________________ ActivePerl mailing list ActivePerl@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs