Setting the character encoding in the XML header is the right way to do
this, but I believe you must keep 3 things in mind:

1. Is the name that I use for my encoding (in your case 'Big5') the correct
and internationally standardized name (try to look at the different ISO
specifications can help here) for that encoding? E.g. encoding="ASCII" may
not work, whereas encoding="US-ASCII" should.

2. Is the parser, which will process the XML, up to the job of recognizing
this character encoding? If a parser doesn't recognize the encoding, but can
read the first couple of bytes (of the string "<?xml version= ..." - that's
one of the reasons it is there for) it will normally conclude the encoding
is UTF-8. I don't know if Xerces can recognize the Big5 encoding.

3. Finally, the application (e.g. Netscape) used to view the characters must
know how to represent these characters. Its job is to convert the character
data (i.e. numbers) to a visual representation (a.k.a. glyphs).

I hope this helps, comments are welcome.

Tom Vekemans

-----Original Message-----
From: William Leung [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, August 22, 2000 12:19 PM
To: JetSpeed
Subject: How to handle multibyte (multilingual) in RSSPortlet?


I am not sure how many people here tried display some
language in RSS file other than English. In RSS file, I
declared <?xml version="1.0" encoding="Big5"?> as I used
Chinese (Big 5 encoding) in the context. I have item written
in Chinese like
<item>
<title>CHINESEWORD<title>
.....</item>

CHINESEWORD is the double-byte word and it's value equals A4
A4 in hex.

The generated page by RSSPortlet and transformed by XSL
resulted in <td>&curren;&curren;</td>. Problem comes when I
view the page with IE (5.5). It displays the two symbol
characters each representing the value A4 (&curren;), rather
than display one chinese word as I expected.

Netscape display correctly if the character set is set to
ISO-5591-1 but in error for Character set equals "Big5"...

I know it seems NOT a bug in RSSPortlet, just wanna to know
what is a correct way to handle different character set in
RSS (and XML).

Thanks in advance.

--
Regards,
William Leung




--
--------------------------------------------------------------
Please read the FAQ! <http://java.apache.org/faq/>
To subscribe:        [EMAIL PROTECTED]
To unsubscribe:      [EMAIL PROTECTED]
Archives and Other:  <http://java.apache.org/main/mail.html>
Problems?:           [EMAIL PROTECTED]


--
--------------------------------------------------------------
Please read the FAQ! <http://java.apache.org/faq/>
To subscribe:        [EMAIL PROTECTED]
To unsubscribe:      [EMAIL PROTECTED]
Archives and Other:  <http://java.apache.org/main/mail.html>
Problems?:           [EMAIL PROTECTED]

Reply via email to