On Mit, 14 Apr 1999, Fred Condo wrote:
>Geoff Hutchison wrote:
>> 
>> I finally had some time to sit down and do some ht://Dig work. I've been
>> swamped with getting our server back to speed after several accounts were
>> compromised. :-( It turns out someone managed to slip a packet sniffer onto
>> our network. <ugh>
>> 
>> Anyway, I cleaned out the incoming folder of the bug system. This was one
>> of the messages posted.
>> 
>> Is this correct? If so, what should we do about it? We can't use | because
>> that already has a meaning for ht://Dig. Furthermore, we'll still have to
>> parse & separators because many browsers (and a *lot* of URLs) still use it.
>> 
>> Anyone have a good suggestion for a separator? I'd go for * offhand, but I
>> might be missing some horrible consequence (I was going to suggest # first
>> and realized the error of my ways).
>> 
>> -Geoff
>> 
>> Date: Sat, 3 Apr 1999 20:15:05 -0800
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>> Subject: PRIVATE: Use of & as CGI variable separator vs. HTML 4.0
>> 
>> Full_Name: Fred Condo
>> Version: 3.1.1
>> OS: FreeBSD 2.2.8
>> Submission from: pm3dyn102.dip.csuchico.edu (132.241.249.102)
>> 
>> HTML 4.0 strict does not permit the & character in the URLs generated as CGI
>> variable separators for the page list. This is because the & introduces a
>> general entity.
>> 
>> W3C recommend rewriting the code to use a different separator, such as | or ;
>> that does not have special meaning in HTML.
>> 
>> Until this is done, ht://dig cannot emit valid HTML 4.0 (strict).
>
>I'm the originator of this bug report, and it occurs to me there are two
>separate problems.
>
>First is the one I had in mind when reporting it: the pages list of links when
>there are more results that fit on one ht://dig results page. Those links look
>like this:
>
>http://webclass.csuchico.edu/cgi-bin/htsearch?restrict=&exclude=&config=webclass&method=and&format=builtin%2Dlong&words=HTML&page=7
>
>Htsearch generates and uses this, so it shouldn't be a big matter to change the
>separator. A quick check with the W3C validator shows that encoding the
>ampersands as &amp; validates under HTML 4.0 (strict) and works with Netscape
>4.51.
>
>The second class of URL emitted by htsearch is a link to a page in the search
>database. The default exclusion list in the sample configuration file disallows
>CGI scripts, which are I would guess the principal users of the & separator.
>But it's conceivable that there are still URLs that have & in them. I don't
>know that there is any easy answer for this, unless the &amp; solution noted
>above is generally good.
>
>The reference for the invalidity of the naked & in URLs is at
>http://www.cs.duke.edu/~dsb/kgv-faq/errors.html#bad-entity
>
>The W3C Validator is at http://validator.w3.org/

AFAIK is neither an URL nor is an URI part of the HTML 4.0 specification.
In fact the specs (6.4) refer to RFCs 1630 (URI) and 1808 (URL).
AFAIK CGI parameters and their respective separators are part of those.
IMHO W3C cannot change their meaning unless those RFCs are changed, too.
Furthermore this would lead to a complete incompatibility in *all*
CGI applications on the web, which cannot be a task of W3C.

You are right that those characters are in fact invalid when they
are *not* part of an URL or URI which refers to a document (i.e.
when they are not emitted as text but as HREF tags where they need
to be quoted).


cheers,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]            Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to