On 07/01/12 19:00, Anthony J. Bentley wrote:
ropers writes:
This diff fixes things:

--- bsdcan11-mandoc-openbsd.html        2012-06-30 22:18:52.000000000 +0200
+++ bsdcan11-mandoc-openbsd.html.newentities    2012-06-30 22:34:58.000000000
+0200
@@ -13,7 +13,7 @@

  <p><a href="http://www.flickr.com/photos/tomkoadam/4778126822/";><img
  src="http://farm5.static.flickr.com/4115/4778126822_555b453a1e.jpg";></a></p>
-<p>Csiko - Foal. - Photo: Adam Tomko @flickr (CC)</p>
+<p>Csik&oacute; - Foal. - Photo: Adam Tomk&oacute; @flickr (CC)</p>

  <HR>
  <P>Ingo Schwarze: Mandoc in OpenBSD - page 2: INTRO I -
@@ -725,7 +725,7 @@
  <HR>
  <P>Ingo Schwarze: Mandoc in OpenBSD - page 22: RECURRING II -
  BSDCan 2011, May 13, Ottawa</P>
-<H1>Bogue deja vue:</H1>
+<H1>Bogue d&eacute;j&agrave; vue:</H1>
  <H2>Collecting regression tests.</H2>
  <UL>
  <LI>Slow start in 2009:

That's it. That's all.

The advantage of using pure ASCII plus HTML escapes in a page is that it
displays the correct content regardless of declared character encoding.
The disadvantage is that it means adding escapes *everywhere*. Can you
imagine writing http://www.openbsd.org/cs/ in anything but native UTF-8?
At some point we have to pick an encoding and stick with it.

So again, the complaint was that there was mojibake gibberish in
Ingo's presentation, because the character encoding isn't specified
but defaults to UTF-8 in modern browsers, while the page is actually
iso-8859-1 encoded.

Actually, "modern" browsers do not default to a particular encoding (in
fact, this violates the HTML standard). Instead, they attempt to autodetect
the charset. Sometimes this works, and sometimes it doesn't -- I've seen
UTF-8 pages incorrectly detected as ISO-8859-1, and in particularly bad
cases, vice versa.

There were many objection to a simple addition of<HEAD><META
http-equiv="Content-Type" content="text/html; charset=iso-8859-1"
/><HEAD/>  as a fix.

Yes, this is pretty ugly. But the only alternative is using one encoding
everywhere and setting the appropriate HTTP header instead of an HTML
meta tag. Actually, that's not a bad idea, but it means using UTF-8 on all
pages, since that's the only encoding that can handle the different
translations on the OpenBSD website. It would also require removing or
altering meta tags on all pages (but considering the alternative is *adding*
meta tags to all pages...).

But then I thought, what about browsers that don't support UTF-8 yet;
this is going to break things for them.

I challenge you to find a single browser in ports that doesn't. IE6
supports UTF-8 properly. Even Lynx works fine when the user has a UTF-8
locale. (And ISO-8859-* are also locale-dependent, so this is not any
worse.)


So, in summary, the options are:

Use HTML escapes everywhere. IMO, highly impractical.

Use any encoding you wish, and set a meta tag when appropriate. This is
basically what we have now. (The front pages of /, /de/, /fr/ all use
ISO-8859-1; /cs/ uses UTF-8; /lt/ uses ISO-8859-13.)

My vote's on this.

/Alexander


Use UTF-8 everywhere, and enforce this either with an HTTP header or
meta tags.

--
Anthony J. Bentley

Reply via email to