Below is the update I just sent to the Oracle list, in case it sparks
any ideas here.
I have no objection to UTF-8 except that converting to it is
problematic, according to Ask Tom, because everything takes up more
space and things can overflow their storage (something else Oracle
should handle, but presumably does not). So for the moment I'll settle
for any character set that works, and worry about the Euro later. :)
janine
Thanks to everyone who e-mailed me with suggestions and info from
metalink and other sources. It was all good stuff, but ultimately not
the right stuff. I tried various scenarios:
- export from old database with NLS_LANG set to US7ASCII, import to new
database with NLS_LANG set to USASCII, let imp do the conversion
- export from old database with NLS_LANG set to WE8ISO8859P1, import to
new database with NLS_LANG set to WE8ISO8859P1, let exp do the
conversion
but nothing worked. I ultimately found a white paper from Oracle (this
is the HTML version of their PDF, from google):
http://66.102.7.104/search?q=cache:1shB3SfIDNkJ:www.oracle.com/
technology/tech/globalization/pdf/
TWP_Character_Set_Migration_Best_Practices_10gR2.pdf+oracle+export+us7as
cii++8-bit&hl=en&start=1&client=firefox-a
which talks about what a big job this is. As far as I can tell, the
problem is that Oracle lets you store 8-bit characters with a 7-bit
character setting, but when you do any sort of conversion it forgets
about that and drops the 8th bit.
I then tried a different approach. I set up a second database on the
8i system, still in US7ASCII. I loaded my data into it, and verified
that this time it loaded and displayed correctly. Then I tried
changing the character set of the database. I thought this might
perhaps use different conversion code and maybe it was done correctly,
but no joy here either. I first tried WE8ISO8859P1, then I tried
WE8MSWIN1252, which is specifically for those funky characters from MS
Word, and I even tried UTF8. The first two gave the same result as my
imp/exp trials, which is corruption of all the 8-bit characters, and
the last corrupted the data completely so all I got was a row of
question marks and nothing readable at all. And yes, I was creating
the database over from scratch in US7ASCII and reloading the data each
time (it's been a long night.... :)
I need to read that conversion doc from Oracle in more detail, but what
I got from skimming over it was basically that one should use tools to
remove "problem" data before doing the conversion. In this case it
would be the same data that's getting corrupted. I'm not getting the
impression that this data can be saved....
One thing just occurred to me. If I set up a second database on the 9i
system in US7ASCII and loaded the dump from the 8i system, and
connected to it from our application with NLS_LANG set to US7ASCII, I
would probably get good data. The problem here is that I have no way
to set up a connection to the target database at the same time with a
different NLS_LANG value; they will both use the same environment
variable and I know of no way to override it.. Drat, I thought I was on
to something for a second there! :)
If I'm missing, overlooking or otherwise misunderstanding something,
please let me know. At this point the alternatives are looking a bit
bleak.....
janine
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]>
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.