> From: Michael Hanisch <[EMAIL PROTECTED]>
> Date: 2000-07-28 06:48:14 -0400

> Are you really positive about this?

Randal is 100% correct.

> AFAIK (and I just looked it up in the HTML 3.2 spec :-) the <A> tag is
> defined as follows:
>
> (quoted from http://www.w3.org/TR/REC-html32#sgmldecl)

> So the HREF-attribute is CDATA, and not PCDATA (like "normal" text) - to
> me this sounds like a plain "&" (ampersand character) is perfectly legal
> in this place.
> But then I'm no SGML wizz, so I might be wrong...

At first blush, you would seem to be right, but the situation is much
more subtle.

Quoting from "The SGML FAQ Book" by Steven DeRose:

Question1.9: Can I prevent entity recognition in attributes?

Answer: "No. Entity references area always recognized in attribute
values, even those of CDATA. The usual way to get around this is to
substitute "&amp;" for any needed ampersand characters....

SGML provides an attribute declared value called CDATA. However, this
term means something different for attributes than anywhere else it
is used in SGML. All attributes (regardless of their declared values)
have entities replaced, much like RCDATA content. Declared values are
better viewed as testing the value that results after all the
processing such as entity replacement, whitespace normalization, and
case-folding occur. .... "

So you must use &amp;. Normally the set of HTML entities is limited
which is why you can get away with it (unsafe though because some
future HTML spec may introduce an entity that will cause a conflitct).
With XML and XHTML coming, introduction of new entities will be
uncontrolled and you will have much more potential for conflicts.

We publish physics articles in a series of journals and we needed to
implement a uniform linking interface to the articles that
incorporated indirection to allow people to robustly link even if we
move our content to different platforms. We deliberately chose to use
URL's with a series of '/' delimited fields rather then using '? ...
&....'  style URL's precisely because most people don't know they have
to escape the ampersands and we didn't want to risk people's links
breaking in the future because of some new entity in the HTML spec.

Cheers,
Mark

Mark Doyle
Manager, Product Development
The American Physical Society

Reply via email to