The dump will either contain literal ASCII question marks because Oracle has decided to clip the high bits during export, or it will contain mangled byte sequences which don't correspond to any character which can be displayed on your screen.
If Oracle clipped, your out of luck. Otherwise, you can fix this the wrong way and go to bed by using tr, sed, or emacs query-replace. If your data is mainly English with some special characters, such as in resumé, and Microsoft 'smart' quotes etc., it shouldn't be too hard to identify which mangled bytes should be translated into which UTF8 characters. You can get a rough handle on the size of the problem by running something like: LC_LANG=C grep -v -U -c '[[:alnum:]|[:punct:]|[:space:]]' my.dump On 7/19/05, Janine Sisk <[EMAIL PROTECTED]> wrote: > Hi all, > > I've just made a mess for myself and I'm hoping someone will know how > to fix it. It's really more of an Oracle problem and the message below > is a modified version of one i just sent to an Oracle list, but I > thought perhaps someone here would have already struggled with it. > > I took a site that was running under 8.7.1.4 and moved it to 9.2.0.4 > (both on RedHat Linux) using exp/imp. I didn't specify a character set > in either case. The data has accented characters and they have been > working fine in 8.1..7.4. > > Now, it seems that the default setting of NLS_CHARACTERSET in 8.1.7.4 > was US7ASCII and in 9.2.0.4 it's WE8ISO8859P1. Everything I've read > about this conversion says that since it's going from 7 bit to 8 bit > there shouldn't be any data problems. Well, hah! :) We didn't spot > any at first, but now that the client is looking closely he's finding > pages all over the place that have ?? where accented characters should > be. > > The problem was even worse at first; some characters displayed ok > until you edited the page via the web browser, and then they turned > into ?? as well. I was able to fix that, as far as I can tell, by > setting NLS_CHARACTERSET to WE8ISO8859P1 in the environment of the user > running the site. It has not, unfortunately, helped us with the rest > of the mess. > > AOLserver is configured to use iso-8859-1 for it's charset and has been > all along. The only thing that has changed here is the Oracle version > and it's charset. I have this in the ns/parameters section: > > ns_param HackContentType 1 > ns_param DefaultCharset iso-8859-1 > ns_param HttpOpenCharset iso-8859-1 > ns_param OutputCharset iso-8859-1 > ns_param URLCharset iso-8859-1 > > Going back and reimporting the data is a last resort, as we'd either > lose or recreate user data that has been entered since the site was > moved on Sunday night. Is there anything else I can do to fix this? > > In short, heeeeeeelp! :) > > thanks, > > janine > > > -- > AOLserver - http://www.aolserver.com/ > > To Remove yourself from this list, simply send an email to <[EMAIL > PROTECTED]> with the > body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: > field of your email blank. > -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
