Greetings again Gilles, Thanks for your feedback. I've fixed up some of the sloppyness in the patch (patch.kurl, relative to htdig-3.2.0b4-20020922) and done some fairly thorough testing on it. It now only removes leading slashes if there are at least the prescribed number. It also stores the slash count as '0'+slashcount rather than '\1'+slashcount. That retains the efficiency of using a single-character, but means that naive reading of the string gives the correct result (provided slashcount < 10).
I still haven't found out the "right" way to call test/url, but I have called it by hand, and the results match those of the original, with one exception. It now normalises the host name (e.g. resolves aliases) for *all* protocols with an IP host, rather than just http://. It still only removes "index.html" from "http://". I have also expanded the tests in test/url.cc to test the new functionality, and to test alias resolution. The patch is in "patch.test". It would be good to have a file of the "correct" output of this file so that changes to URL.cc could be tested simply with diff, but I didn't know how to put that in the patch. In the process of testing, I noticed that double slashes ('//') are normalised out *after* resolving '../'. This means that '/foo//../' resolves to '/foo/', when UNIX would resolve it to '/'. Is the current behaviour deliberate, or an oversight? Finally, what is the schedule/criterion for the release of 3.2.0b4? The KDE team won't base their code on snapshots, and so I'd like to expedite this... Which of the items in Geoff's regular "current status" posts have to be addressed before the release? Thanks! Lachlan On Wed, 18 Sep 2002 08:58, Gilles Detillieux wrote: > According to Lachlan Andrew: > > Here is a patch (url-format.patch) to allow the format > > of external_protocol URLs be <protocol>:<path>, rather > > than <protcol>://<host>/<path>. It seems to work in the > > cases I've tested, but I'm not sure how to try it on > > the test suite, so I hope I haven't broken anything > > else... > > For the 1st one, I don't have time to test it myself, but > I'd like to wait for comments/testing from other > developers if any is forthcoming. > > To answer some of the questions you ask in the code of > your patch, the .get() after a sub seemed to be needed > with some compilers, to avoid warnings. For whatever > reason, we couldn't just assign the result of a .sub() to > another String, even though .sub() is supposed to return > a String. The .get() gives us a (char *) from that, and > everything seems to work well that way. > > On line 324 of URL.cc, you say "(should also check the > slashes are actually there...)". I agree, the code > should do this. > > The way the slash count is encoded in the Dictionary > entries is kludgy, but I think there are similar kludges > elsewhere in the code. It's also self-contained in one > method of the URL class, so I don't have a problem with > it. -- Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678 Dept of Electrical and Electronic Engg CRICOS Provider Code University of Melbourne, Victoria, 3010 AUSTRALIA 00116K
diff -rc ./test/url.cc ../htdig-3.2.0b4-20020922-lha/test/url.cc *** ./test/url.cc Sat Jul 27 13:48:21 2002 --- ../htdig-3.2.0b4-20020922-lha/test/url.cc Sun Sep 29 18:04:34 2002 *************** *** 34,44 **** // These should probably be tested individually // but for now, we'll just assume they're set to defaults static ConfigDefaults defaults[] = { { "allow_virtual_hosts", "true", 0 }, { "case_sensitive", "true", 0 }, { "remove_default_doc", "index.html", 0 }, ! { "server_aliases", "", 0 }, { 0 } }; --- 34,46 ---- // These should probably be tested individually // but for now, we'll just assume they're set to defaults + // (except for external protocol test). static ConfigDefaults defaults[] = { + { "external_protocols", "https:// dummy.transport help: dummy.transport", 0 }, { "allow_virtual_hosts", "true", 0 }, { "case_sensitive", "true", 0 }, { "remove_default_doc", "index.html", 0 }, ! { "server_aliases", "alias.com:443=true.com:443", 0 }, { 0 } }; *************** *** 120,125 **** --- 122,131 ---- { while (fgets(buffer, sizeof(buffer), urllist)) { + buffer [sizeof(buffer) - 1] = '\0'; // make strlen() safe + int len = strlen(buffer); + if (len && buffer [len-1] == '\n') + buffer [len-1] = '\0'; // remove trailing '\n' children.Add(new String(buffer)); } fclose(urllist); *************** *** 131,137 **** while (fgets(buffer, sizeof(buffer), urllist)) { parent = URL(buffer); ! cout << "Parent: \n"; parent.dump(); if (params->test_children) { --- 137,143 ---- while (fgets(buffer, sizeof(buffer), urllist)) { parent = URL(buffer); ! cout << "Parent: " << buffer << '(' << parent.signature().get() << ")\n"; parent.dump(); if (params->test_children) { *************** *** 139,144 **** --- 145,151 ---- children.Start_Get(); while ((current = (String *)children.Get_Next())) { + cout << "\nChild: " << current->get() << endl; child = URL(current->get(), parent); child.dump(); } diff -rc ./test/url.parents ../htdig-3.2.0b4-20020922-lha/test/url.parents *** ./test/url.parents Sat Jul 27 13:48:21 2002 --- ../htdig-3.2.0b4-20020922-lha/test/url.parents Sun Sep 29 14:24:08 2002 *************** *** 33,35 **** --- 33,43 ---- file://localhost/opt/htdig/maindocs/index.html file://localhost:80/home/ghutchis/www/home.html http://www.htdig.org/cgi-bin/test.cgi?date=10/1/99#anchor1 + https://test.com/life.html + https://test.com:803/./../../orgs/life.html + https://alias.com:8080/./../../orgs/life.html + https://alias.com/./../../orgs/life.html + https:/www.fail.com/ + https:www.fail.com + help:/khelpcenter/ + help:/khelpcenter/what-is-kde.html#what-is-kde-introduction
diff -rc ./htcommon/URL.cc ../htdig-3.2.0b4-20020922-lha/htcommon/URL.cc *** ./htcommon/URL.cc Sat Jul 27 13:48:14 2002 --- ../htdig-3.2.0b4-20020922-lha/htcommon/URL.cc Sun Sep 29 18:13:13 2002 *************** *** 19,24 **** --- 19,25 ---- #endif /* HAVE_CONFIG_H */ #include "URL.h" + #include "QuotedStringList.h" #include "Dictionary.h" #include "HtConfiguration.h" #include "StringMatch.h" *************** *** 37,42 **** --- 38,45 ---- #define NNTP_DEFAULT_PORT 119 + static Dictionary *slashCount = 0; + //***************************************************************************** // URL::URL() // Default Constructor *************** *** 317,324 **** _host = 0; _port = 0; _url = 0; _path = p; ! if (strcmp((char*)_service, "file") == 0) _host = "localhost"; } else --- 320,336 ---- _host = 0; _port = 0; _url = 0; + if (p) // if non-NULL, skip (some) leading slashes in path + { + int i; + for (i = slashes (_service); i > 0 && *p == '/'; i--) + p++; + if (i) // if fewer slashes than specified for protocol don't + // delete any. -> Backwards compatible (necessary??) + p -= slashes (_service) - i; + } _path = p; ! if (strcmp((char*)_service, "file") == 0 || slashes (_service) < 2) _host = "localhost"; } else *************** *** 574,583 **** if (_service.length() == 0 || _normal) return; ! if (strcmp((char*)_service, "http") != 0) return; ! removeIndex(_path); // // Convert a hostname to an IP address --- 586,599 ---- if (_service.length() == 0 || _normal) return; ! ! // if (strcmp((char*)_service, "http") != 0) ! // if service specifies doesn't specify an IP host, don't normalize it ! if (slashes (_service) != 2) return; ! if (strcmp ((char*)_service, "http") == 0) ! removeIndex(_path); // // Convert a hostname to an IP address *************** *** 711,716 **** --- 727,786 ---- } //***************************************************************************** + // int URL::slash(const String &protocol) + // Returns number of slashes folowing the service name for protocol + // + int + URL::slashes(const String &protocol) + { + if (!slashCount) + { + HtConfiguration* config= HtConfiguration::config(); + slashCount = new Dictionary(); + + slashCount->Add (String("mailto"), new String("0")); + slashCount->Add (String("news"), new String("0")); + slashCount->Add (String("http"), new String("2")); + slashCount->Add (String("ftp"), new String("2")); + // file:/// has three, but the last counts as part of the path... + slashCount->Add (String("file"), new String("2")); + + QuotedStringList qsl(config->Find("external_protocols"), " \t"); + String from; + int i; + int sep,colon; + + for (i = 0; qsl[i]; i += 2) + { + from = qsl[i]; + sep = from.indexOf("->"); + if (sep != -1) + from = from.sub(0, sep).get(); // "get" aids portability... + + colon = from.indexOf(":"); + // if service specified as "help:/" or "man:", note trailing slashes + // Default is 2. + if (colon != -1) + { + int i; + char count [2]; + for (i = colon+1; from[i] == '/'; i++) + ; + count [0] = i - colon + '0' - 1; + count [1] = '\0'; + from = from.sub(0,colon).get(); + slashCount->Add (from, new String (count)); + } else + slashCount->Add (from, new String ("2")); + } + } + + // Default to two slashes for unknown protocols + String *count = (String *)slashCount->Find(protocol); + return count ? (count->get()[0] - '0') : 2; + } + + //***************************************************************************** // void URL::constructURL() // Constructs the _url member from everything else // Also ensures the port number is correct for the service *************** *** 725,743 **** _url = _service; _url << ":"; ! if (!(strcmp((char*)_service, "news") == 0 || ! strcmp((char*)_service, "mailto") == 0 )) ! _url << "//"; ! if (strcmp((char*)_service, "file") != 0) ! { ! if (_user.length()) ! _url << _user << '@'; ! _url << _host; ! } ! if (_port != DefaultPort() && _port != 0) // Different than the default port ! _url << ':' << _port; _url << _path; } --- 795,819 ---- _url = _service; _url << ":"; ! // Add correct number of slashes after service name ! int i; ! for (i = slashes (_service); i > 0; i--) ! { ! _url << "/"; ! } ! if (slashes (_service) == 2) // services specifying a particular ! { // IP host must begin "service://" ! if (strcmp((char*)_service, "file") != 0) ! { ! if (_user.length()) ! _url << _user << '@'; ! _url << _host; ! } ! if (_port != DefaultPort() && _port != 0) // Different than the default port ! _url << ':' << _port; ! } _url << _path; } diff -rc ./htcommon/URL.h ../htdig-3.2.0b4-20020922-lha/htcommon/URL.h *** ./htcommon/URL.h Sat Jul 27 13:48:14 2002 --- ../htdig-3.2.0b4-20020922-lha/htcommon/URL.h Sun Sep 29 11:18:19 2002 *************** *** 69,74 **** --- 69,76 ---- void normalizePath(); void ServerAlias(); void constructURL(); + // Number of slashes following service specifier. eg service("http")=2 + static int slashes(const String &); }; diff -rc ./htcommon/defaults.cc ../htdig-3.2.0b4-20020922-lha/htcommon/defaults.cc *** ./htcommon/defaults.cc Sun Sep 22 17:13:07 2002 --- ../htdig-3.2.0b4-20020922-lha/htcommon/defaults.cc Sun Sep 29 11:22:21 2002 *************** *** 836,841 **** --- 836,847 ---- The external protocols are specified as pairs of strings, the first being the URL scheme that \ the script can handle while the second is the path to the script itself. If the second is \ quoted, then additional command-line arguments may be given.<br> \ + If the external protocol does not contain a colon (:), it is assumed \ + to have the standard format \ + \"protocol://[usr[:password]@]address[:port]/path\". \ + If it ends with a colon, then it is assumed to have the simpler format \ + \"protocol:path\". If it ends with \"://\" then the standard form is \ + again assumed. <br> \ The program takes three command-line parameters, not counting any parameters already given \ in the command string:<br> \ <em>protocol URL configuration-file</em><br> \ diff -rc ./htdig/ExternalTransport.cc ../htdig-3.2.0b4-20020922-lha/htdig/ExternalTransport.cc *** ./htdig/ExternalTransport.cc Sat Jul 27 13:48:15 2002 --- ../htdig-3.2.0b4-20020922-lha/htdig/ExternalTransport.cc Sun Sep 29 11:20:08 2002 *************** *** 93,98 **** --- 93,104 ---- to = from.sub(sep+2).get(); from = from.sub(0, sep).get(); } + + // Recognise service specified as "https://" rather than "https" + sep = from.indexOf(":"); + if (sep != -1) + from = from.sub(0, sep).get(); + handlers->Add(from, new String(qsl[i + 1])); toTypes->Add(from, new String(to)); }