Thanks Jim,

I had a feeling htdig was doing this for a good reason, but it's good to
know for sure. 
I think I'll change my cgi's to place a dummy value in the url, instead
of hacking URL.cc...this will also conform the url to standards.

Thanks again

Craig


On Mon, 2003-07-07 at 23:06, Jim Cole wrote:
> On Monday, July 7, 2003, at 03:52 PM, Craig Taylor wrote:
> 
> > I have urls one my site that are in the format:
> > http://www.whatever.com/test.cgi//88
> >
> > When I run htdig and htmerge I get search results with the url above
> > changed to: http://www.whatever.com/test.cgi/88 missing the second
> > forward slash.
> 
> The problem is that the URLs you are using are not valid. Although they 
> may work fine in other situations, they do not comply with the RFC that 
> defines URLs. You are trying to pass a '/' as ordinary data while the 
> RFC defines the '/' as a reserved character. At least that is my 
> reading. Since consecutive '/' characters are not allowed for, htdig 
> collapses them into a single '/'. The reason that htdig goes out of its 
> way to change URLs in this fashion is that they have the potential to 
> create loops as htdig spiders through a site.
> 
> My guess is that you are not going to find an easy workaround. 
> Normalization of the URL path occurs very early in the process, during 
> the initial parse of the URL. If you want to look at the code that does 
> the normalization, and perhaps hack it to your needs, look for the 
> normalizePath() method in URL.cc.
> 
> Jim
> 
-- 
Politics is supposed to be the second-oldest profession. I have come to
realize that it bears a very close resemblance to the first.
--Ronald Reagan



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to