Thanks Giles,
A long response to your questions, but thanks for your agreement....
> > 1. Weak answer: Convenience and stability: the patch (appears to be)
> > pretty local to one function within Display.cc, within AOLserver there is
> > one function and within OpenACS there are about ten functions that would
> > require changing to understand the use of semicolons.
>
>Yikes. Ten functions to do the one simple task of parsing a query string?
>They're not into reusing code, are they?
They are into reusing code, but the toolkit has grown rapidly in the past
18 months by an open source community of developers numbering in several
hundreds and with large and varying amounts of skill and
communication. It's about 100,000 to 150,000 lines written as a
combination of Tcl, PL/SQL, with some Java and PERL thrown in. Code is
reused heavily, and the toolkit is actually pretty well done, but yes, it
could use a bit of cleanup and refactoring, and that's exactly what's
happenng in the OpenACS 4.0 project.
> > 2. Stronger answer: Standards. As you mention, "there's no question that
> > the ampersand is still the standard...". (I understand the importance of
> > the word "still" suggesting that that may not be true in the future.)
>
>Well, now you're quoting me out of context. It's the standard
>separator for the CGI interface. But, W3C recommends ';', or at the
>very least, '&' within URIs. The use of the semicolon may be a mere
>recommendation, but the HTML 4.0 standard (and more recent standards
>derived from it, i.e. 4.01 and XHTML) are pretty clear on the point
>that bare ampersands in URIs are a no-no. So we have a case of two
>conflicting standards, leaving us two options for resolving the conflict:
>1) use the separator that W3C suggests for URIs, but still recognizing
>the ampersand in query strings passed by CGI, or 2) use & in URIs,
>which the browser will convert to a simple ampersand when it passes it
>back to the server when following the link.
>
>W3C isn't suggesting we change the CGI standard, at least I don't think
>they are, and neither am I. What I am suggesting is that CGI programs
>accept a dual-standard and recognize both separators. This strikes me
>as the ideal solution.
>
>What's wrong with the 2nd approach, i.e. using &? Well, for
>one thing it's cumbersome and ugly (minor point, I know), but also
>because this doesn't address what, up until now, had been the biggest
>beef people had with the change from '&' to ';'. The main complaint,
>as far as I can recall, was with PHP wrappers that directly parse the
>URIs put out by htsearch, and not a problem with parsing the CGI input.
>PHP wrappers will still see the unprocessed '&', and not a bare
>ampersand, so they still would need to be changed.
I suspect the W3 recommendation is similar to their recommendation on
maximum URL lengths: (rfc2068)
>The HTTP protocol does not place any a priori limit on the length of a
>URI. Servers MUST be able to handle the URI of any resource they serve,
>and SHOULD be able to handle URIs of unbounded length if they provide
>GET-based forms that could generate such URIs
In other words, it's particular depending on your site's configuration:
your server and the applications you are running. It doesn't affect
anything outside of yor site.
I think what you are hearing from the community (what you are hearing from
me) is that in our situation our sites are not going to have any confusion
with respect to the ampersands now or in the near future, and what is most
appropriate for our sites are the uses of ampersands now (and possibly
semicolons in the future.)
> > I will suggest that AOLserver/OpenNSD is not the only webserver that
> > understands ampersands at the moment but that does not understand
> > semicolons. The question becomes: must all webservers come up to the
> level
> > of the protocol where not only the minimal standard is supported, but all
> > recommendations are supported to use htDig, or is there somewhat that
> htDig
> > can be made to support all webservers easily and still support the highest
> > conforming webservers?
>
>This is the part I'm still having a bit of difficulty understanding.
>Not being familiar at all with AOLserver/OpenACS/OpenNSD, I don't
>know what parts of it need to look at query strings in URIs at all.
>I've never seen it as a web server problem per se, but rather a problem
>with CGI programs and wrapper scripts. I do have an Apache-centric
>world view, I admit, but in the scheme of things as I see it, the web
>server passes unprocessed query strings to the individual CGI programs.
>Does AOLserver process query strings itself before passing them on?
>Are various CGI programs integrated into the server in a monolithic sort
>of manner? How does htsearch fit into this picture?
AOLserver makes the raw URI and a parsed/processed set of parameters
available to the application. The vast majority of applications deal
purely with the parsed set of parameters, but not all. For instance, VXML
modules using the commercial Tellme service need access to the raw URI
since Tellme POSTs audio wavs to the webserver and passes other query
parameters in the URI as a query string. AOLserver doesn't handle that
situation where it's getting both GET and POST like data. There may be
other situations as well.
As in the PHP situation I am exec'ing out to htsearch, but I don't need to
parse the URI's myself, they make the round trip to browser and back.
Well, you do make a case that hasn't been made before, namely that this
>does seem to go beyond the relunctance to bring a few wrapper scripts
>or CGIs in line with the times. I guess I'm just a little surprised,
>given that HTML 4.0 has been around for well over 2 year now, at the
>inertia involved in conforming to it. Is AOLserver never going to
>adopt W3C's recommendations? Given AOL's size, I imagine that's a
>distinct possibility. They may be more inclined to follow the MS route
>of defining their own versions of existing standards.
You've certainly hit the nail on the head. AOLserver has actually been
very cooperative with the open source community, but they are very AOL
centric and if it doesn't touch on a core AOL needs, it's hard to get the
elephant's attention.
But I suspect it's not just AOLserver. Netcraft lists 38 different
webservers with more than 5000 sites, and 53 webservers with more than 1000
sites (and many many more with even smaller numbers.) Now it is true that
the overwhelming vast majority of sites run either Apache, Netscape, or
IIS, but there is certainly a healthy population of other webservers out
there that may be conforming but not yet supporting the use of semi-colons.
>I think the semicolon separator should remain the default, but I wouldn't
>oppose a config attribute to change it. However, I think it would be
>wrong to make it a simple choice between ';' and '&' (i.e. as a boolean
>attribute), because it closes the door to the better choice of '&'
>when that would work. So, maybe there should be a string attribute
>that defines the separator, with ';' being the default, and '&'
>being the recommended alternative.
>
>If I'm not mistaken, using '&' would still meet your requirements,
>as for you it seems to be a server issue, and the server should only see
>the simple ampersand decoded by the client and passed back to the server.
>This wouldn't cause htsearch to violate the HTML 4.0 standard.
Yes, I think you're right.
>The bare '&' as separator should be a last recourse, only for cases where
>the htsearch output must be processed directly by a wrapper program that
>can't be fixed to allow ';' or '&'.
Agreed.
I believe the appropriate thing to do then is to make a configuration
parameter that contains a string that defaults to ";" but that a site can
change to anything (?) with the recommendation being "&"?
I can make the patch, but my skills are mainly C and Java and Tcl. I see
where I could brokenly hardcode some stuff into htsearch/Display.cc and it
looks as though new configuration parameters are entered in
htcommon/defaults.cc. Where else should I be looking?
Thanks once more,
Jerry Asher
=====================================================
Jerry Asher [EMAIL PROTECTED]
1678 Shattuck Avenue Suite 161 Tel: (510) 549-2980
Berkeley, CA 94709 Fax: (877) 311-8688
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html