Re: [Bug-wget] wget-1.12.1-devel: 'Unsupported scheme' and windows pathnames

Ray Satiro Sat, 12 Dec 2009 22:17:57 -0800

Hi Steven,

On windows this is valid:
C:\Users\Internet\Desktop>if exist c://file.txt echo hi
hi

Rarely have I seen something output using double forward slashes but "mixed" 
scripts could use that format although probably not intentional.

I can't find any valid single character schemes. Would it break anything to 
return false if the scheme name is only one character?  That would be stricter.

I rewrote url_has_scheme(), patch attached. This supersedes the patch I 
submitted for retr.c

Jay

--- On Sat, 12/12/09, Steven Schubiger <[email protected]> wrote:

From: Steven Schubiger <[email protected]>
Subject: Re: [Bug-wget] wget-1.12.1-devel: 'Unsupported scheme' and windows 
pathnames
To: "Ray Satiro" <[email protected]>
Cc: [email protected]
Date: Saturday, December 12, 2009, 9:14 AM

Ray Satiro <[email protected]> wrote:
> I checked the source and it appears that the 'unsupported scheme' error is 
> caused because the path on windows can be misinterpreted as a URL. This 
> traces back to retrieve_from_file() in src/retr.c. 
> 
> 1.12.1-devel retrieve_from_file() is different from 1.11.4 in the way it 
> handles checking for a URL as the input file. Changes could be made to retr.c 
> retrieve_from_file() or maybe url.c url_has_scheme(), which doesn't really 
> validate any type of scheme.
> 
> Attached is a patch that eliminates the check in retrieve_from_file() for 
> url_has_scheme() and instead checks url_parse() to see determine if the input 
> file is a URL. This is probably sufficient since url_parse() checks 
> url_scheme().

I'd prefer patching url_has_scheme() to test for schemes more strictly;
the implementation would add a check for a double slash at the end of
the scheme and colon (which are part of the URL).

Attached a patch for review.

--- wget-1.12.1-devel-orig/src/url.c	2009-09-22 14:04:36 -0400
+++ wget-1.12.1-devel/src/url.c	2009-12-13 00:55:38 -0500
@@ -442,25 +442,34 @@
 }
 
 #define SCHEME_CHAR(ch) (c_isalnum (ch) || (ch) == '-' || (ch) == '+')
+/* Scheme characters should be alphanumeric or + or - */
 
-/* Return 1 if the URL begins with any "scheme", 0 otherwise.  As
-   currently implemented, it returns true if URL begins with
-   [-+a-zA-Z0-9]+: .  */
-
+/* Return 1 if a possible naming scheme specifier is found, 0 otherwise. */
 bool
 url_has_scheme (const char *url)
 {
   const char *p = url;
+  char *p2;
+  unsigned scheme_len, i;
 
-  /* The first char must be a scheme char. */
-  if (!*p || !SCHEME_CHAR (*p))
+  assert( p );
+  p2 = strstr( p, "://" );
+  /* strstr can return p if zero length string */
+
+  scheme_len = p2 > p ? p2 - p : 0;
+
+  /* if the possible scheme is only one character it is not valid */
+  /* also we could have something like C://file.txt */
+  if( scheme_len <= 1 )
     return false;
-  ++p;
-  /* Followed by 0 or more scheme chars. */
-  while (*p && SCHEME_CHAR (*p))
-    ++p;
-  /* Terminated by ':'. */
-  return *p == ':';
+
+  for( i = 0; i < scheme_len; ++i, ++p )
+  {
+    if( !SCHEME_CHAR( *p ) )
+      return false;
+  }
+
+  return true;
 }
 
 int

Re: [Bug-wget] wget-1.12.1-devel: 'Unsupported scheme' and windows pathnames

Reply via email to