Jens Thoms Toerring
Tue, 26 Aug 2003 16:58:07 +0000
This is a patch I haven't mentioned yet. It's actually for two
separate problems (sorry, Kir).
The first one is related to "Location:" entries in the header
the server sends. According to RFC2616 it should followed by
an absoluteURI, but one some machines there's just an absolute
path to a different page on the server instead. The first part
of the patch is to deal with this more gracefully.
The second patch is for cases where there are links in a document
that start with a slash, i.e. something like
<a href="/foo/bar/xxx.html">
As far as I can see these are not treated correctly (aspseek does
not seem to follow these links), and that's what the second part
of the patch is for.
Regards, Jens
--
Freie Universitaet Berlin Jens Thoms Toerring
Universitaetsbibliothek
Webteam Tel: 0049 30 838 56055
Garystrasse 39 Fax: 0049 30 838 53738
14195 Berlin e-mail: [EMAIL PROTECTED]
--- aspseek-orig/src/parse.cpp 2003-08-19 13:50:25.000000000 +0200
+++ aspseek-my/src/parse.cpp 2003-08-26 18:38:21.000000000 +0200
@@ -876,6 +876,14 @@
string location_unescaped;
char *location_trim = str_trim(location);
URIUnescapeSGML(location_trim, location_unescaped,
ucontent.m_charset);
+
+ // If the URI isn't RFC2616 conform, i.e. isn't a
absoluteURI
+ // but just a path prepend it by the server name in
the hope
+ // to make it an absoluteURI...
+
+ if ( *location_unescaped.c_str() == '/' )
+ location_unescaped = m_url +
location_unescaped;
+
if (!newURL.ParseURL(location_unescaped.c_str()))
{
int newMethod;
@@ -2187,6 +2195,13 @@
string href_unescaped;
char *href_trim = str_trim(href);
URIUnescapeSGML(href_trim, href_unescaped,
ucontent->m_charset);
+
+ // Prepend the reference with the server name etc. if
it's an
+ // absolute path, otherwise we get in trouble later
+
+ if ( *href_unescaped.c_str() == '/' )
+ href_unescaped = CurSrv->m_url +
href_unescaped;
+
if (doc->m_hops >= CurSrv->m_maxhops)
{
}