Greetings again Gilles,

Thanks for your feedback.  I've fixed up some of the 
sloppyness in the patch (patch.kurl, relative to 
htdig-3.2.0b4-20020922) and done some fairly thorough 
testing on it.  It now only removes leading slashes if 
there are at least the prescribed number.  It also stores 
the slash count as '0'+slashcount rather than 
'\1'+slashcount.  That retains the efficiency of using a 
single-character, but means that naive reading of the 
string gives the correct result (provided slashcount < 10).

I still haven't found out the "right" way to call test/url, 
but I have called it by hand, and the results match those 
of the original, with one exception.  It now normalises the 
host name (e.g. resolves aliases) for *all* protocols with 
an IP host, rather than just  http://.  It still only 
removes "index.html" from "http://";.

I have also expanded the tests in  test/url.cc  to test the 
new functionality, and to test alias resolution.  The patch 
is in "patch.test".  It would be good to have a file of the 
"correct" output of this file so that changes to URL.cc 
could be tested simply with  diff,  but I didn't know how 
to put that in the patch.

In the process of testing, I noticed that double slashes 
('//') are normalised out *after* resolving '../'.  This 
means that '/foo//../' resolves to '/foo/', when UNIX would 
resolve it to '/'.  Is the current behaviour deliberate, or 
an oversight?

Finally, what is the schedule/criterion for the release of 
3.2.0b4?  The KDE team won't base their code on snapshots, 
and so I'd like to expedite this...  Which of the items in 
Geoff's regular "current status" posts have to be addressed 
before the release?

Thanks!
Lachlan

On Wed, 18 Sep 2002 08:58, Gilles Detillieux wrote:
> According to Lachlan Andrew:
> > Here is a patch (url-format.patch) to allow the format
> > of external_protocol URLs be <protocol>:<path>, rather
> > than <protcol>://<host>/<path>. It seems to work in the
> > cases I've tested, but I'm not sure how to try it on
> > the test suite, so I hope I haven't broken anything
> > else...
>
> For the 1st one, I don't have time to test it myself, but
> I'd like to wait for comments/testing from other
> developers if any is forthcoming.
>
> To answer some of the questions you ask in the code of
> your patch, the .get() after a sub seemed to be needed
> with some compilers, to avoid warnings.  For whatever
> reason, we couldn't just assign the result of a .sub() to
> another String, even though .sub() is supposed to return
> a String.  The .get() gives us a (char *) from that, and
> everything seems to work well that way.
>
> On line 324 of URL.cc, you say "(should also check the
> slashes are actually there...)".  I agree, the code
> should do this.
>
> The way the slash count is encoded in the Dictionary
> entries is kludgy, but I think there are similar kludges
> elsewhere in the code.  It's also self-contained in one
> method of the URL class, so I don't have a problem with
> it.

-- 
Lachlan Andrew  Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg          CRICOS Provider Code
University of Melbourne, Victoria, 3010  AUSTRALIA      00116K
diff -rc ./test/url.cc ../htdig-3.2.0b4-20020922-lha/test/url.cc
*** ./test/url.cc	Sat Jul 27 13:48:21 2002
--- ../htdig-3.2.0b4-20020922-lha/test/url.cc	Sun Sep 29 18:04:34 2002
***************
*** 34,44 ****
  
  // These should probably be tested individually
  // but for now, we'll just assume they're set to defaults
  static ConfigDefaults defaults[] = {
    { "allow_virtual_hosts", "true", 0 },
    { "case_sensitive", "true", 0 },
    { "remove_default_doc", "index.html", 0 },
!   { "server_aliases", "", 0 },
    { 0 }
  };
  
--- 34,46 ----
  
  // These should probably be tested individually
  // but for now, we'll just assume they're set to defaults
+ // (except for external protocol test).
  static ConfigDefaults defaults[] = {
+   { "external_protocols", "https:// dummy.transport help: dummy.transport", 0 },
    { "allow_virtual_hosts", "true", 0 },
    { "case_sensitive", "true", 0 },
    { "remove_default_doc", "index.html", 0 },
!   { "server_aliases", "alias.com:443=true.com:443", 0 },
    { 0 }
  };
  
***************
*** 120,125 ****
--- 122,131 ----
      {
        while (fgets(buffer, sizeof(buffer), urllist))
  	{
+ 	  buffer [sizeof(buffer) - 1] = '\0';	// make strlen() safe
+ 	  int len = strlen(buffer);
+ 	  if (len && buffer [len-1] == '\n')
+ 	      buffer [len-1] = '\0';		// remove trailing '\n'
  	  children.Add(new String(buffer));
  	}
        fclose(urllist);
***************
*** 131,137 ****
    while (fgets(buffer, sizeof(buffer), urllist))
      {
        parent = URL(buffer);
!       cout << "Parent: \n";
        parent.dump();
        if (params->test_children)
  	{
--- 137,143 ----
    while (fgets(buffer, sizeof(buffer), urllist))
      {
        parent = URL(buffer);
!       cout << "Parent: " << buffer << '(' << parent.signature().get() << ")\n";
        parent.dump();
        if (params->test_children)
  	{
***************
*** 139,144 ****
--- 145,151 ----
  	  children.Start_Get();
  	  while ((current = (String *)children.Get_Next()))
  	    {
+ 	      cout << "\nChild: " << current->get() << endl;
  	      child = URL(current->get(), parent);
  	      child.dump();
  	    }
diff -rc ./test/url.parents ../htdig-3.2.0b4-20020922-lha/test/url.parents
*** ./test/url.parents	Sat Jul 27 13:48:21 2002
--- ../htdig-3.2.0b4-20020922-lha/test/url.parents	Sun Sep 29 14:24:08 2002
***************
*** 33,35 ****
--- 33,43 ----
  file://localhost/opt/htdig/maindocs/index.html
  file://localhost:80/home/ghutchis/www/home.html
  http://www.htdig.org/cgi-bin/test.cgi?date=10/1/99#anchor1
+ https://test.com/life.html
+ https://test.com:803/./../../orgs/life.html
+ https://alias.com:8080/./../../orgs/life.html
+ https://alias.com/./../../orgs/life.html
+ https:/www.fail.com/
+ https:www.fail.com
+ help:/khelpcenter/
+ help:/khelpcenter/what-is-kde.html#what-is-kde-introduction
diff -rc ./htcommon/URL.cc ../htdig-3.2.0b4-20020922-lha/htcommon/URL.cc
*** ./htcommon/URL.cc	Sat Jul 27 13:48:14 2002
--- ../htdig-3.2.0b4-20020922-lha/htcommon/URL.cc	Sun Sep 29 18:13:13 2002
***************
*** 19,24 ****
--- 19,25 ----
  #endif /* HAVE_CONFIG_H */
  
  #include "URL.h"
+ #include "QuotedStringList.h"
  #include "Dictionary.h"
  #include "HtConfiguration.h"
  #include "StringMatch.h"
***************
*** 37,42 ****
--- 38,45 ----
  
  #define NNTP_DEFAULT_PORT 119
  
+ static Dictionary	*slashCount = 0;
+ 
  //*****************************************************************************
  // URL::URL()
  // Default Constructor
***************
*** 317,324 ****
  	_host = 0;
  	_port = 0;
  	_url = 0;
  	_path = p;
! 	if (strcmp((char*)_service, "file") == 0)
  	  _host = "localhost";
      }
      else
--- 320,336 ----
  	_host = 0;
  	_port = 0;
  	_url = 0;
+ 	if (p)		// if non-NULL, skip (some) leading slashes in path
+ 	{
+ 	    int i;
+ 	    for (i = slashes (_service); i > 0 && *p == '/'; i--)
+ 		p++;
+ 	    if (i)	// if fewer slashes than specified for protocol don't
+ 			// delete any. -> Backwards compatible (necessary??)
+ 		p -= slashes (_service) - i;
+ 	}
  	_path = p;
! 	if (strcmp((char*)_service, "file") == 0 || slashes (_service) < 2)
  	  _host = "localhost";
      }
      else
***************
*** 574,583 ****
      if (_service.length() == 0 || _normal)
  	return;
  
!     if (strcmp((char*)_service, "http") != 0)
  	return;
  
!     removeIndex(_path);
  
      //
      // Convert a hostname to an IP address
--- 586,599 ----
      if (_service.length() == 0 || _normal)
  	return;
  
!     
! //  if (strcmp((char*)_service, "http") != 0)
!     // if service specifies doesn't specify an IP host, don't normalize it
!     if (slashes (_service) != 2)
  	return;
  
!     if (strcmp ((char*)_service, "http") == 0)
! 	removeIndex(_path);
  
      //
      // Convert a hostname to an IP address
***************
*** 711,716 ****
--- 727,786 ----
  }
  
  //*****************************************************************************
+ // int URL::slash(const String &protocol)
+ // Returns number of slashes folowing the service name for protocol
+ //
+ int
+ URL::slashes(const String &protocol)
+ {
+     if (!slashCount)
+     {
+ 	HtConfiguration* config= HtConfiguration::config();
+ 	slashCount = new Dictionary();
+ 
+ 	slashCount->Add (String("mailto"), new String("0"));
+ 	slashCount->Add (String("news"),   new String("0"));
+ 	slashCount->Add (String("http"),   new String("2"));
+ 	slashCount->Add (String("ftp"),    new String("2"));
+ 	// file:///  has three, but the last counts as part of the path...
+ 	slashCount->Add (String("file"),   new String("2"));
+ 	
+ 	QuotedStringList	qsl(config->Find("external_protocols"), " \t");
+ 	String			from;
+ 	int			i;
+ 	int			sep,colon;
+ 
+ 	for (i = 0; qsl[i]; i += 2)
+ 	{
+ 	    from = qsl[i];
+ 	    sep = from.indexOf("->");
+ 	    if (sep != -1)
+ 		from = from.sub(0, sep).get();  // "get" aids portability...
+ 
+ 	    colon = from.indexOf(":");
+ 	    // if service specified as "help:/" or "man:", note trailing slashes
+ 	    // Default is 2.
+ 	    if (colon != -1)
+ 	    {
+ 		int i;
+ 		char count [2];
+ 		for (i = colon+1; from[i] == '/'; i++)
+ 		    ;
+ 		count [0] = i - colon + '0' - 1;
+ 		count [1] = '\0';
+ 		from = from.sub(0,colon).get();
+ 		slashCount->Add (from, new String (count));
+ 	    } else
+ 		slashCount->Add (from, new String ("2"));
+ 	}
+     }
+     
+     // Default to two slashes for unknown protocols
+     String *count = (String *)slashCount->Find(protocol);
+     return count ? (count->get()[0] - '0') : 2;
+ }
+ 
+ //*****************************************************************************
  // void URL::constructURL()
  // Constructs the _url member from everything else
  // Also ensures the port number is correct for the service
***************
*** 725,743 ****
      _url = _service;
      _url << ":";
  
!     if (!(strcmp((char*)_service, "news") == 0 ||
! 	  strcmp((char*)_service, "mailto") == 0 ))
! 	_url << "//";
  
!     if (strcmp((char*)_service, "file") != 0)
!       {
! 	if (_user.length())
! 	  _url << _user << '@';
! 	_url << _host;
!       }
  
!    if (_port != DefaultPort() && _port != 0)  // Different than the default port
!       _url << ':' << _port;
  
      _url << _path;
  }
--- 795,819 ----
      _url = _service;
      _url << ":";
  
!     // Add correct number of slashes after service name
!     int i;
!     for (i = slashes (_service); i > 0; i--)
!     {
! 	_url << "/";
!     }
  
!     if (slashes (_service) == 2)	// services specifying a particular
!     {					// IP host must begin "service://"
! 	if (strcmp((char*)_service, "file") != 0)
! 	  {
! 	    if (_user.length())
! 	      _url << _user << '@';
! 	    _url << _host;
! 	  }
  
!        if (_port != DefaultPort() && _port != 0)  // Different than the default port
! 	  _url << ':' << _port;
!     }
  
      _url << _path;
  }
diff -rc ./htcommon/URL.h ../htdig-3.2.0b4-20020922-lha/htcommon/URL.h
*** ./htcommon/URL.h	Sat Jul 27 13:48:14 2002
--- ../htdig-3.2.0b4-20020922-lha/htcommon/URL.h	Sun Sep 29 11:18:19 2002
***************
*** 69,74 ****
--- 69,76 ----
      void                normalizePath();
      void		ServerAlias();
      void		constructURL();
+     // Number of slashes following service specifier.  eg service("http")=2
+     static int		slashes(const String &);
  };
  
diff -rc ./htcommon/defaults.cc ../htdig-3.2.0b4-20020922-lha/htcommon/defaults.cc
*** ./htcommon/defaults.cc	Sun Sep 22 17:13:07 2002
--- ../htdig-3.2.0b4-20020922-lha/htcommon/defaults.cc	Sun Sep 29 11:22:21 2002
***************
*** 836,841 ****
--- 836,847 ----
  	  The external protocols are specified as pairs of strings, the first being the URL scheme that \
  	the script can handle while the second is the path to the script itself. If the second is \
  	quoted, then additional command-line arguments may be given.<br> \
+ 	If the external protocol does not contain a colon (:), it is assumed \
+ 	to have the standard format \
+ 	\"protocol://[usr[:password]@]address[:port]/path\". \
+ 	If it ends with a colon, then it is assumed to have the simpler format \
+ 	\"protocol:path\". If it ends with \"://\" then the standard form is \
+ 	again assumed. <br> \
  	  The program takes three command-line parameters, not counting any parameters already given  \
  	in the command string:<br> \
  	<em>protocol URL configuration-file</em><br> \
diff -rc ./htdig/ExternalTransport.cc ../htdig-3.2.0b4-20020922-lha/htdig/ExternalTransport.cc
*** ./htdig/ExternalTransport.cc	Sat Jul 27 13:48:15 2002
--- ../htdig-3.2.0b4-20020922-lha/htdig/ExternalTransport.cc	Sun Sep 29 11:20:08 2002
***************
*** 93,98 ****
--- 93,104 ----
  		to = from.sub(sep+2).get();
  		from = from.sub(0, sep).get();
  	    }
+ 
+ 	    // Recognise service specified as "https://"; rather than "https"
+ 	    sep = from.indexOf(":");
+ 	    if (sep != -1)
+ 		from = from.sub(0, sep).get();
+ 
  	    handlers->Add(from, new String(qsl[i + 1]));
  	    toTypes->Add(from, new String(to));
  	}

Reply via email to