I have been through and fixed some problems which prevented some of
the HTML files from validating. AFAICT, everything now validates (with
the sole exception of missing "alt" attributes within <img> tags).

Please ensure that all HTML files continue to validate against the
HTML 4.0 Transitional DTD. At some point, I want to replace g.html2man
with something more robust (e.g. something which handles tables), and
I don't particularly want to make a "smart" (i.e. fault-tolerant) HTML
parser (e.g. Beautiful Soup) a required dependency.

If you have OpenSP or OpenJade, you can validate an HTML file with
e.g.:

        nsgmls -s -c /usr/share/sgml/openjade-1.3.2/pubtext/HTML4.soc 
<filename>.html

[The program may be called nsgmls or onsgmls, and the exact location
where the catalogues are installed will vary.]

This needs to be done on the completed HTML file in
dist.<arch>/docs/html; the <module>.html files in the module
directories won't normally validate, as they lack the header which is
added by running the module with the --html-description.

FWIW, the most common error was using block elements (e.g. <div>,
<pre>, <p>) in contexts where only inline elements are allowed
(primarily <dt>).

You can determine which elements are allowed where from the DTD:

http://www.w3.org/TR/1998/REC-html40-19980424/sgml/loosedtd.html

E.g. the definition:

<!ELEMENT DT - O (%inline;)*           -- definition term -->

indicates that only inline elements are allowed inside DT, while e.g.:

<!ELEMENT DD - O (%flow;)*             -- definition description -->

indicates that both block and inline elements are allowed inside DD.

If you don't want to read the DTD, here's a rough summary:

Entity classes:

        %StyleSheet     = <CSS stylesheet>
        %Script         = <JavaScript code>
        
        %html.content   = HEAD, BODY
        %head.content   = TITLE, ISINDEX, BASE
        %heading        = H1, H2, H3, H4, H5, H6
        %fontstyle      = TT, I, B, U, S, STRIKE, BIG, SMALL
        %phrase         = EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, ABBR,
                          ACRONYM
        %special        = A, IMG, APPLET, OBJECT, FONT, BASEFONT, BR, SCRIPT,
                          MAP, Q, SUB, SUP, SPAN, BDO, IFRAME
        %formctrl       = INPUT, SELECT, TEXTAREA, LABEL, BUTTON
        %list           = UL, OL,  DIR, MENU
        %head.misc      = SCRIPT, STYLE, META, LINK, OBJECT
        %pre.exclusion  = IMG, OBJECT, APPLET, BIG, SMALL, SUB, SUP,
                          FONT, BASEFONT
        %preformatted   = PRE
        %block          = P, DL, DIV, CENTER, NOSCRIPT, NOFRAMES,
                          BLOCKQUOTE, FORM, ISINDEX, HR, TABLE, FIELDSET,
                          ADDRESS, %heading, %list, %preformatted
        %inline         = #PCDATA, %fontstyle, %phrase, %special, %formctrl
        %flow           = %block, %inline

The immediate children permitted for each element are:
        
        A:              %inline
        ABBR:           %inline
        ACRONYM:        %inline
        ADDRESS:        %inline, P
        APPLET:         %flow, PARAM
        B:              %inline
        BDO:            %inline
        BIG:            %inline
        BLOCKQUOTE:     %flow
        BODY:           %flow, INS, DEL
        BUTTON:         %flow
        CAPTION:        %inline
        CENTER:         %flow
        CITE:           %inline
        CODE:           %inline
        COLGROUP:       COL
        DD:             %flow
        DEL:            %flow
        DFN:            %inline
        DIR:            LI
        DIV:            %flow
        DL:             DT, DD
        DT:             %inline
        EM:             %inline
        FIELDSET:       %flow, LEGEND
        FONT:           %inline
        FORM:           %flow
        FRAMESET:       FRAMESET, FRAME, NOFRAMES
        H1:             %inline
        H2:             %inline
        H3:             %inline
        H4:             %inline
        H5:             %inline
        H6:             %inline
        HEAD:           %head.content, %head.misc
        HTML:           %html.content
        I:              %inline
        IFRAME:         %flow
        INS:            %flow
        KBD:            %inline
        LABEL:          %inline
        LEGEND:         %inline
        LI:             %flow
        MAP:            %block, AREA
        MENU:           LI
        NOFRAMES:       %flow
        NOSCRIPT:       %flow
        OBJECT:         %flow, PARAM
        OL:             LI
        OPTGROUP:       OPTION
        OPTION:         #PCDATA
        P:              %inline
        PRE:            %inline
        Q:              %inline
        S:              %inline
        SAMP:           %inline
        SCRIPT:         %Script
        SELECT:         OPTGROUP, OPTION
        SMALL:          %inline
        SPAN:           %inline
        STRIKE:         %inline
        STRONG:         %inline
        STYLE:          %StyleSheet
        SUB:            %inline
        SUP:            %inline
        TABLE:          CAPTION, COL, COLGROUP, THEAD, TFOOT, TBODY
        TBODY:          TR
        TD:             %flow
        TEXTAREA:       #PCDATA
        TFOOT:          TR
        TH:             %flow
        THEAD:          TR
        TITLE:          #PCDATA
        TR:             TH, TD
        TT:             %inline
        U:              %inline
        UL:             LI
        VAR:            %inline

Some elements don't allow certain elements as descendents:

        A:              A
        BUTTON:         %formctrl, A, FORM, ISINDEX, FIELDSET, IFRAME
        DIR:            %block
        FORM:           FORM
        LABEL:          LABEL
        MENU:           %block
        PRE:            %pre.exclusion
        TITLE:          %head.misc

Notes:

1. The children of DIR/MENU are LI, which is a block element, but
those LI can't contain block elements. UL/OL don't have this
restriction.

2. DT cannot contain block elements, but DD can. This means that you
can't use <div class="code"><pre> in a DT; use <span class="code"><tt>
instead. DIV and PRE are block elements; SPAN and TT are inline.

3. TABLE cannot have TR as a child. But TBODY can have TR, and TBODY
allows both the start and end tags to be omitted, so
<table><tr>....</tr></table> is really just a shorthand for
<table><tbody><tr>....</tr></tbody></table>.

4. P cannot contain blocks. So <p>...<div> is actually shorthand for
<p>...</p><div>. But <p>...<div>...</div>...</p> is an error, as the
</p> doesn't match any open element (the <div> implicitly closed the
original <p>, and P doesn't allow the start tag to be omitted).

5. HTML, HEAD, BODY, and TBODY allow the start tag to be omitted. With
the exception of TBODY, this feature shouldn't be used (it's a
nuisance to implement if the number of valid child tags is large).

-- 
Glynn Clements <[EMAIL PROTECTED]>
_______________________________________________
grass-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to