[EMAIL PROTECTED] wrote: > Hi Tim, > > I have spent about three hours using Google and Metalink trying to > find the > > answer to what must be a FAQ when generating XML from Oracle but I > cannot find anything that answers the question. > > I work for a company in the UK and we are generating an XML file from > an Oracle 9.2 database using DBMS_XMLQUERY and DBMS_LOB. The XML file > has to be > in UTF-8 format (encoding="utf-8") but this is not the format that > it is held in the database. From what I have read, it seems that it > is AL32UTF8.
Oracle's utf-8 support for the XML packages is shameful. http://www.dbforums.com/showthread.php?t=1212787. For any of this to happen automatically, you need to make sure that your original data (what you're making the XML from) is stored correctly in the database's character set. This means that if your data actually contains utf-8, the database characterset should be AL32UTF8. It appears your data is indeed in utf-8, because 49827 (0xC2A3) is the utf-8 representation of the pound sign. You can determine your database's character set with this query: select value from v$nls_parameters where parameter = 'NLS_CHARACTERSET' Assuming your database characterset actually matches the data that's in it, you can just set the client character set to your desired output, and the encoding is done for you: # data from Oracle will now be converted correctly # into Perl's internal encoding. export NLS_LANG=AMERICAN_AMERICA.AL32UTF8 # in perl... # convert from Perl's internal encoding to utf-8 when # printing to stdout. binmode(STDOUT, ':utf8'); # convert from Perl's internal to utf-8 If your database characterset does not match the data (for example, your database characterset is US7ASCII) then sorry, you will have to manually convert each unicode column. If you're using Oracle 10G, then there are functions to convert XML encodings explicitly using character set Ids. Otherwise, you'll have to figure out the right hocus pocus to transfer the data from Oracle to Perl without losing character information, and then encode/decode in Perl with the Encode module. If you haven't already, I recommend reading the Oracle 9i Globalization Best Practices document. It can help get your head around how the character sets work. http://tinyurl.com/mtsxg [oracle.com]. Hope it helps. Regards, Philip
