On 20 Mar 2007 at 12:55, Chas Owens wrote: > On 3/20/07, Beginner <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I have a large, 1.3GB xml file that I was trying to validate. It > > turns out that the file has a lot of exotic characters in it such as: > > é > > è > > Ä > > È > > ...etc
> > Being a lazy kidda guy, I though I would cat the file and let perl > > make the substitiuations where it found any of these characters. My > > problem is I am not sure how to regex for these characters except to > > look for the hex value. Neither do I know of a way to escape/encode > > them correctly. > > > > I have seen the pragma utf8 but I am not sure my problem is what this > > pragma was designed for. Does anyone have any suggestions for a > > module or method that might take some of the pain out of detecting > > and escaping such characters? > > Be a truly lazy guy and use iconv. The hex idea might work. If I can locate all the characters in a file/string that use a hex in a range and the substitute them. Perhaps something like this would do it (feel free to correct me if I am wrong) s/\xc9/'&#'.$1.';'/ # Hoping for É from É However it doesn't feel like it's the best approach. The Iconv route hasn't been too successful either. I tried Text::Iconv->new('ISO8859-1','utf8'); Thinking that my data is currently ISO8859-1but the results were not as I had hoped. Where I had MICROSCÓPIO, I got MICROSCÃPIO. If I can't convert, perhaps I need to XHTML escape them so MICROSCÓPIO would become MICROSCÕPIO Is there are module that can parse and substitute in this way or do I need to roll my own? TIA, Dp. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/