Hi Bob. Do the stylesheets output both html 4, html 5, xhtml and xhtml5?
Or did you conflate html 4 and html 5? See more below.
On 14 Aug 2017, at 18:48, Bob Stayton wrote:
We have a bug report suggesting that the default output encoding for
the DocBook html stylesheet be changed from ISO-8859-1 to UTF-8.
I agree with this bug report. Why? Well, for one thing, you - here -
talk about "html", and "html" today means "html 5". HTML 5.x recommends
that documents are authored using UTF-8.
Also, when I look at the link in the forwarded message
(https://www.oxygenxml.com/forum/viewtopic.php?f=6&t=14812&p=43711#p43711),
I note that the discussion thread talks about HTML 5. I am not able to
see that HTML 4 is mentioned at all in that thread.
Note this only applies to the original HTML 4 output from the "html"
directory.
Are you saying that the stylesheet also outputs HTML 5? (Note that I ask
about "HTML 5" and not about xhtml or xhtml5.)
The "xhtml" and "xhtml5" outputs already output UTF.
The justification for that ought to be that XML defaults to UTF-8. Xhtml
and xhtml5 are not 'html'.
The original HTML 4 standard said ISO-8859-1 was the default encoding,
but that UTF-8 would be acceptable.
I am not able to find such statement in the HTMl 4 specification. I
looked at the one page version: https://www.w3.org/TR/html401/html40.txt
UTF-8 ”took over” as the dominant encoding on the Web long before
HTML 5 became the official version of HTML.
Technically speaking ISO-8859-1 is STILL the default HTML encoding, from
user agents’ perspective. It is only from an authoring perspective
that HTML 5 recommends UTF-8.
DocBook stylesheets is an authoring tool. THere is only one processing
model for HTML, and that model is defined by the latets HTML spec. Thus
it should use UTF-8.
At the very least, the DocBook stylesheet should not use the HTML 4
specification as a justification for failing to output HTML 5 as UTF-8.
It isn't difficult for a user to change the output to UTF-8, but it
does require a customization. The question here is whether to change
the default output encoding to UTF-8.
If the user has to change the output to UTF-8 in order to produce HTML 5
output, then the stylesheet does not follow HTML5’s recommendations.
The fact that the user can produce XHTMl - and thus automatically get
UTF-8 - does not alter the picture.
This would change the HTML output to replace character references like
&#xXXXX; to actual UTF-8 encoded characters, and change the encoding
information in the header to reflect that.
This would be nice. But just for the record: HTML 5.x does not recommend
against using character references. Thus, if need be, you CAN pick a
compromise: you can continue to output the character references and yet
label the document as <meta content="http-equiv"
content="text/html;charset=UTF-8">. This would then meet HTML 5’s
recommendation.
I'm reluctant to change something that will break the builds that
DocBook people depend on. Would this impact you if the change was
made?
One thing to perhaps consider s whether interaction between external CSS
stylesheets (that DocBook may produce) and the HTML output is affected.
I do not think so, but perhaps there are some edge cases. If you need, I
can look into it.
Leif
Bob Stayton
-------- Forwarded Message --------
[bugs:#1400] Default encoding for HTML-based outputs
.
Status: open
Group: output: HTML
Created: Thu Aug 10, 2017 11:41 AM UTC by Radu Coravu
Last Updated: Thu Aug 10, 2017 11:41 AM UTC
Owner: nobody
One of our clients reported that the default output encoding for
Docbook to HTML is ISO 8859-1 which is not suitable at all for other
languages with extended char sets like Russian:
https://www.oxygenxml.com/forum/viewtopic.php?f=6&t=14812&p=43711#p43711
Maybe the default language for HTML (and also for HTML chunk) should
be changed to be UTF-8 as UTF-8 is already used as the default
language for XHTML.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]