aseek-devel  

[aseek-devel] parse.cpp.patch

Jens Thoms Toerring
Tue, 26 Aug 2003 16:58:07 +0000

This is a patch I haven't mentioned yet. It's actually for two
separate problems (sorry, Kir).

The first one is related to "Location:" entries in the header
the server sends. According to RFC2616 it should followed by
an absoluteURI, but one some machines there's just an absolute
path to a different page on the server instead. The first part
of the patch is to deal with this more gracefully.

The second patch is for cases where there are links in a document
that start with a slash, i.e. something like

<a href="/foo/bar/xxx.html">

As far as I can see these are not treated correctly (aspseek does
not seem to follow these links), and that's what the second part
of the patch is for.
                                    Regards, Jens
-- 
 Freie Universitaet Berlin     Jens Thoms Toerring
 Universitaetsbibliothek
 Webteam                       Tel: 0049 30 838 56055
 Garystrasse 39                Fax: 0049 30 838 53738
 14195 Berlin                  e-mail: [EMAIL PROTECTED]


--- aspseek-orig/src/parse.cpp  2003-08-19 13:50:25.000000000 +0200
+++ aspseek-my/src/parse.cpp    2003-08-26 18:38:21.000000000 +0200
@@ -876,6 +876,14 @@
                                string location_unescaped;
                                char *location_trim = str_trim(location);
                                URIUnescapeSGML(location_trim, location_unescaped, 
ucontent.m_charset);
+
+                               // If the URI isn't RFC2616 conform, i.e. isn't a 
absoluteURI
+                               // but just a path prepend it by the server name in 
the hope
+                               // to make it an absoluteURI...
+
+                               if ( *location_unescaped.c_str() == '/' )
+                                       location_unescaped = m_url + 
location_unescaped;
+
                                if (!newURL.ParseURL(location_unescaped.c_str()))
                                {
                                        int newMethod;
@@ -2187,6 +2195,13 @@
                                string href_unescaped;
                                char *href_trim = str_trim(href);
                                URIUnescapeSGML(href_trim, href_unescaped, 
ucontent->m_charset);
+
+                               // Prepend the reference with the server name etc. if 
it's an
+                               // absolute path, otherwise we get in trouble later
+
+                               if ( *href_unescaped.c_str() == '/' )
+                                       href_unescaped = CurSrv->m_url + 
href_unescaped;
+
                                if (doc->m_hops >= CurSrv->m_maxhops)
                                {
                                }