Re: Extended Unicode Characters

J.Pietschmann Mon, 07 Oct 2002 12:03:13 -0700

Al-Dhahir, Haitham wrote:
> J.Pietschmann is right in saying that the characters are coming from the
> database - I am storing them there, pulling them out and inserting them into
> my XML which is then being converted using FOP. E.g. I am storing "&#x00F0;"
> in the database, which I then hope will become the Icelandic eth character.
> 
> This database storage seems to be the issue, as when I inserted &#x00F0;
> directly into my XML this was correctly interpreted as the eth character and
> it showed up in my PDF. However, when I pull this from the DB, it remains as
> &#x00F0;.
> 
> Why is this, and is there anything I can do about it?
> 
In order to read XML from a file, usually an XML parser is used.
A parser converts character references into Unicode characters,
which are used in the usual parser APIs. If you pull strings from
a DB an insert into a DOM Document or a or call a ContentHandler,
no parser is involved, therefore character references are not
converted (the same happens to strings resembling XML fragments,
they are still strings and are not automagically converted to XML
elements).
If you can, insert the characters you have to deal with directly
into the DB (this is also good for searches). If you run afoul the
DB's character encoding or get API issues, try to store your
strings as an UTF-16 encoded BLOB, or run your own escape/unescape
routines.


Keywords for further reading: character ecodiung, Java strings
(internal representation), DOM and SAX API specs.

J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Extended Unicode Characters

Reply via email to