Hi guys,
I don't mean to be a pest, but I was wondering if any decision was made
regarding eliminating and ?somedata tags from a URL when doing a
file-system type scan.
Any info about this would be appreciated.
As always, thanks for your time!
Sincerely,
Mike
On Mon, 19 Feb 2001, Gilles Detillieux wrote:
> Date: Mon, 19 Feb 2001 14:05:19 -0600 (CST)
> From: Gilles Detillieux <[EMAIL PROTECTED]>
> To: Geoff Hutchison <[EMAIL PROTECTED]>
> Cc: Gilles Detillieux <[EMAIL PROTECTED]>,
> Michael J.Fiorill <[EMAIL PROTECTED]>,
> [EMAIL PROTECTED]
> Subject: Re: [htdig] Re: HTdig Change
>
> According to Geoff Hutchison:
> > On Fri, 16 Feb 2001, Gilles Detillieux wrote:
> > > I disagree with this. I think htdig is making a safe assumption in
> > > treating the query strings as significant.
> >
> > My argument would be that if a file has a query, it should really be
> > treated through the HTTP server (for server parsing, etc.) *or* perhaps as
> > an option, the query should be ignored for local filesystem indexing. I
> > guess my point can be summed up that when looking for a file on the
> > system, it should not be a file "index.html?1723" but "index.html" if the
> > query is to be ignored.
>
> OK, I guess I misunderstood what you were saying. Mike Fiorill wanted
> the query string stripped off when indexing by the local file system,
> and you seemed to be agreeing with that. My point was that anything
> with a query string should fall back to HTTP, unless some other option
> explicitly requests stripping of query strings. My concern is that
> local_urls handling should not affect what document htdig gets, only
> the method by which it gets it, and in the majority of cases a query
> string has an effect on what document gets fetched, so it should not be
> simply ignored. That should be handled by a different config attribute.
>
> You're right, though, that htdig shouldn't be looking for a file name
> like "index.html?1723" on the local filesystem. It does this now, and
> it's only because the lookup normally fails that it falls back to HTTP.
> I think this is the case for all versions of htdig since local_urls was
> added in the early 3.1.x betas.
>
> > > to the handling of local_urls, though, because there are cases where
> > > users wanted query string stripping even for HTTP-based digging. I think
> >
> > Yes, removing query strings is a separate matter, but as I said above, the
> > file code should never try to lookup for "index.html?1723." That's just
> > not how the URL is to be parsed by the RFCs. If you have a legitimate
> > question-mark in a filename, it has to be an encoded one.
>
> That's right. But does it have to be SGML encoded or %xx hex encoded?
> The way htdig works now, as of 3.1.4, is to decode any SGML encoding in
> the entire URL before it breaks down the URL into its component parts.
> So, if I'm not mistaken, when it pops an URL off the server queue,
> it has no way of knowing if a "?" in the URL (or any other character
> for that matter) had been SGML-encoded or not. So, if an encoded question
> mark can legitimately be part of a file name, maybe the code is working
> correctly the way it is now. The only problem is the small chance of a
> "false positive" match if an unencoded question mark and query string
> happen to match an existing file name on the local file system. I think
> the only way to make certain the code can destinguish an unencoded "?"
> from an SGML-encoded one would be to dissect the URL before SGML decoding.
>
> On the other hand, hex encoding would be easier, as that's normally
> left up to the HTTP server. The local_urls handling would be able to
> destinguish between an unencoded "?" and a "%3F" quite easily. Up to
> version 3.1.4, it didn't do any decoding of these, though, so they
> would likely have failed and fallen back to HTTP. As of version 3.1.5,
> it decodes these for the whole URL, so if we were to add a test for a
> query string, it should be before the hex-decoding.
>
>
<[EMAIL PROTECTED], K3MIX>-------------------------------------------------
____ __ ____ Digital Indigo Technologies
/ / /./_/ /_ /__ . __ _ . / / Lancaster, Pennsylvania, U.S.A.
/ / ///\ /_ / / /_// / / / On-line at http://www.digitalindigo.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html