Hi Kapt,

"TrISch-6TTr-IS VKS (Kapt Vekemans)" wrote:

> Setting the character encoding in the XML header is the right way to do
> this, but I believe you must keep 3 things in mind:
>
> 1. Is the name that I use for my encoding (in your case 'Big5') the correct
> and internationally standardized name (try to look at the different ISO
> specifications can help here) for that encoding? E.g. encoding="ASCII" may
> not work, whereas encoding="US-ASCII" should.
>

Big5 is correct.

>
> 2. Is the parser, which will process the XML, up to the job of recognizing
> this character encoding? If a parser doesn't recognize the encoding, but can
> read the first couple of bytes (of the string "<?xml version= ..." - that's
> one of the reasons it is there for) it will normally conclude the encoding
> is UTF-8. I don't know if Xerces can recognize the Big5 encoding.

This seems to be the key point but it's not related to Xerces but Xalan!  Xalan
(seems)doesn't handle Big5 encoding (anyone sure?) this the output text is
converted with character references.

Xalan actually follows exactly to the XSLT specification but doesn't implement
Big5 encoding. According to XSLT spec. " It is possible that the result tree
will contain a character that cannot be represented in the encoding that the
XSLT processor is using for output. In this case, if the character occurs in a
context where XML recognizes character references (i.e. in the value of an
attribute node or text node), then the character should be output as a character
reference; otherwise (for example if the character occurs in the name of an
element) the XSLT processor should signal an error."
(http://www.w3.org/TR/xslt#IANA). As a result, when Xalan doing the
transformation and encounting Big5 character, it convert each of them into
character reference.

Solution? Yep, replace Xalan with other XSLT processor.  :-)
Another alternative that I applied is to tell Xalan not to convert anything for
me by adding disable escaping in output string, e.g. <xsl:value-of
select="given-name" disable-output-escaping = "yes"/>. However, this method has
its draw back, malicious content in RSS may destory your page. For instance, in
RSS <title>ABC&lt;td&gt;</title> after transformation may becomes
<td>ABC<td></td>.

Is there a better complete solution, anyone?

>
> 3. Finally, the application (e.g. Netscape) used to view the characters must
> know how to represent these characters. Its job is to convert the character
> data (i.e. numbers) to a visual representation (a.k.a. glyphs).

Yep, IE didn't do the convertion, suprise and disappointed ;-(

>
> I hope this helps, comments are welcome.

Thanks for your helpful guidelines.


> Tom Vekemans
>
> -----Original Message-----
> From: William Leung [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, August 22, 2000 12:19 PM
> To: JetSpeed
> Subject: How to handle multibyte (multilingual) in RSSPortlet?
>
> I am not sure how many people here tried display some
> language in RSS file other than English. In RSS file, I
> declared <?xml version="1.0" encoding="Big5"?> as I used
> Chinese (Big 5 encoding) in the context. I have item written
> in Chinese like
> <item>
> <title>CHINESEWORD<title>
> .....</item>
>
> CHINESEWORD is the double-byte word and it's value equals A4
> A4 in hex.
>
> The generated page by RSSPortlet and transformed by XSL
> resulted in <td>&curren;&curren;</td>. Problem comes when I
> view the page with IE (5.5). It displays the two symbol
> characters each representing the value A4 (&curren;), rather
> than display one chinese word as I expected.
>
> Netscape display correctly if the character set is set to
> ISO-5591-1 but in error for Character set equals "Big5"...
>
> I know it seems NOT a bug in RSSPortlet, just wanna to know
> what is a correct way to handle different character set in
> RSS (and XML).
>
> Thanks in advance.
>
> --
> Regards,
> William Leung
>
> --
> --------------------------------------------------------------
> Please read the FAQ! <http://java.apache.org/faq/>
> To subscribe:        [EMAIL PROTECTED]
> To unsubscribe:      [EMAIL PROTECTED]
> Archives and Other:  <http://java.apache.org/main/mail.html>
> Problems?:           [EMAIL PROTECTED]
>
> --
> --------------------------------------------------------------
> Please read the FAQ! <http://java.apache.org/faq/>
> To subscribe:        [EMAIL PROTECTED]
> To unsubscribe:      [EMAIL PROTECTED]
> Archives and Other:  <http://java.apache.org/main/mail.html>
> Problems?:           [EMAIL PROTECTED]

--
Regards,
William Leung




--
--------------------------------------------------------------
Please read the FAQ! <http://java.apache.org/faq/>
To subscribe:        [EMAIL PROTECTED]
To unsubscribe:      [EMAIL PROTECTED]
Archives and Other:  <http://java.apache.org/main/mail.html>
Problems?:           [EMAIL PROTECTED]

Reply via email to