"Robert Bergs" <[EMAIL PROTECTED]> writes:
> >>> But "pre"? Who ever said that "pre"'s content is CDATA?
>
> Yes, I agree. The HTML 4.01 specification states,
>
> "CDATA is a sequence of characters from the document character set and may
> include character entities. User agents should interpret attribute values as
> follows:
>
> Replace character entities with characters,
> Ignore line feeds,
> Replace each carriage return or tab with a single space."
>
> It goes on to say that for <style> and <script>, the CDATA must be treated
> differently by the User Agent.
I would say that this definition of CDATA is wrong.
The "SGML Handbook" by Goldfarb quotes the following definitions from
ISO 8879:
4.28 CDATA: Character data
4.33 character data: Zero or more characters that occur in a context
in which no markup is recognized, other than the delimiters that
end the character data. Such characters are classified as data
characters because they were declared to be so.
Also take a look at http://www.sgml.u-net.com/book/sgml-8.htm or read
the XML 1.0 spec. on CDATA sections; <![ CDATA [ .... ]]>.
I am not really an SGML expert, but the to me the problem appears to
be that if an attribute value in <!ATTLIST ...> is declared to be
CDATA is means something different than if the content of an <!ELEMENT
...> is declared to be CDATA. In the first place entities are
replaced, but not in the second. Since entities are expanded in
attribute values they appear to be more like RCDATA, but the entity
expansion really happen before the attribute type is considered, so it
is kind of correct to have CDATA there after all (since it is CDATA
after entity expansion).
The SGML BNF for "attribute value literal" looks like this:
<attribute value literal> =
( '"' <replaceable character data>* '"' ) |
( "'" <replaceable character data>* "'" )
<replaceable character data> =
( <character data> | <character reference> | <general entity reference> | Ee )*
If you are not confused now, you ought to be :-)
Regards,
Gisle