Re: Entities

Klaus Malorny Wed, 15 Nov 2006 14:55:52 -0800

[EMAIL PROTECTED] wrote:

 > > I removed the "encoding", but am still getting the same result.  (The
 > source
 > > file is plain old ASCII but also using several of the characters in the
 > > range 128-255.  I'm not getting any problem with them.)
 >
 > Why dont'y you try the encoding apropriate to the characters you use ?
Olek's right. If you have characters above 128, it isn't "plain oldASCII". In fact, if you have bytes in that range, XML tools (whichgenerally default to UTF-8) will probably think you're trying to specifya multibyte character sequence, so you *definitely* need to specify anencoding.
Real 7-bit ASCII is a proper subset of UTF-8. As soon as you get out ofthat range, you need to either use an encoding that the XML parser knowshow to auto-recognize (UTF-8 or UTF-16), or state your encodingexplicitly. Or both.

As far as I have followed the thread, I think Graeme's problem is less a parsingproblem, but is more a problem how to get the U+010D character back into a"č" when he generates the HTML. Graeme, could you please describe how yougenerate the HTML? I assume that you simply emit your text via an ISO-8859-1 (*)encoding Writer, which converts the non-ISO-8859-1 character to a question mark.If so, you could replace it with a Writer that uses UTF-8 and declare the usedencoding via a


 <meta http-equiv="content-type" content="text/html; charset=UTF-8">

within the <head> section. If you generate your HTML within a JSP page, you needto use the appropriate methods provided by this platform instead. Please notethat generating HTML (or XML) by hand also requires the proper handling of thespecial characters <, & and " (the latter within attribute values) -- somethingthat many people simply forget.



Klaus


* which is the default encoding on many platforms

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Entities

Reply via email to