Ingo Meyer wrote:

To anybody: Does this means that in html standardly the "iso-8859-1" is
taken?

then i will always call: new String (bytes, "iso-8859-1");

Hi Ingo,

no, you shouldn't assume ISO-8859-1 for all cases, although it's a good guess when everything else fails. There are loads of HTML documents on the Web using different encodings. E.g. XHTML pages should use UTF-8 as the default unless specified otherwise.

Finding the proper encoding for a HTML page may require a couple of checks. Here's what you can do:

1. Check the charset parameter of the Content-Type HTTP header.

2. Look for Unicode Byte Order Marks (BOMs) at the beginning of the data.

3. Look for an XML declaration and check the encoding attribute (XHTML pages).

4. Look for <meta http-equiv="Content-Type"> elements.

Only after all of the above checks fail would I use ISO-8859-1 as a guess.

Cheers, Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to