On Monday, July 7, 2003, at 03:52 PM, Craig Taylor wrote:

I have urls one my site that are in the format:
http://www.whatever.com/test.cgi//88

When I run htdig and htmerge I get search results with the url above
changed to: http://www.whatever.com/test.cgi/88 missing the second
forward slash.

The problem is that the URLs you are using are not valid. Although they may work fine in other situations, they do not comply with the RFC that defines URLs. You are trying to pass a '/' as ordinary data while the RFC defines the '/' as a reserved character. At least that is my reading. Since consecutive '/' characters are not allowed for, htdig collapses them into a single '/'. The reason that htdig goes out of its way to change URLs in this fashion is that they have the potential to create loops as htdig spiders through a site.


My guess is that you are not going to find an easy workaround. Normalization of the URL path occurs very early in the process, during the initial parse of the URL. If you want to look at the code that does the normalization, and perhaps hack it to your needs, look for the normalizePath() method in URL.cc.

Jim



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to