Hi Steven,
On windows this is valid:
C:\Users\Internet\Desktop>if exist c://file.txt echo hi
hi
Rarely have I seen something output using double forward slashes but "mixed"
scripts could use that format although probably not intentional.
I can't find any valid single character schemes. Would it break anything to
return false if the scheme name is only one character? That would be stricter.
I rewrote url_has_scheme(), patch attached. This supersedes the patch I
submitted for retr.c
Jay
--- On Sat, 12/12/09, Steven Schubiger <[email protected]> wrote:
From: Steven Schubiger <[email protected]>
Subject: Re: [Bug-wget] wget-1.12.1-devel: 'Unsupported scheme' and windows
pathnames
To: "Ray Satiro" <[email protected]>
Cc: [email protected]
Date: Saturday, December 12, 2009, 9:14 AM
Ray Satiro <[email protected]> wrote:
> I checked the source and it appears that the 'unsupported scheme' error is
> caused because the path on windows can be misinterpreted as a URL. This
> traces back to retrieve_from_file() in src/retr.c.
>
> 1.12.1-devel retrieve_from_file() is different from 1.11.4 in the way it
> handles checking for a URL as the input file. Changes could be made to retr.c
> retrieve_from_file() or maybe url.c url_has_scheme(), which doesn't really
> validate any type of scheme.
>
> Attached is a patch that eliminates the check in retrieve_from_file() for
> url_has_scheme() and instead checks url_parse() to see determine if the input
> file is a URL. This is probably sufficient since url_parse() checks
> url_scheme().
I'd prefer patching url_has_scheme() to test for schemes more strictly;
the implementation would add a check for a double slash at the end of
the scheme and colon (which are part of the URL).
Attached a patch for review.
--- wget-1.12.1-devel-orig/src/url.c 2009-09-22 14:04:36 -0400
+++ wget-1.12.1-devel/src/url.c 2009-12-13 00:55:38 -0500
@@ -442,25 +442,34 @@
}
#define SCHEME_CHAR(ch) (c_isalnum (ch) || (ch) == '-' || (ch) == '+')
+/* Scheme characters should be alphanumeric or + or - */
-/* Return 1 if the URL begins with any "scheme", 0 otherwise. As
- currently implemented, it returns true if URL begins with
- [-+a-zA-Z0-9]+: . */
-
+/* Return 1 if a possible naming scheme specifier is found, 0 otherwise. */
bool
url_has_scheme (const char *url)
{
const char *p = url;
+ char *p2;
+ unsigned scheme_len, i;
- /* The first char must be a scheme char. */
- if (!*p || !SCHEME_CHAR (*p))
+ assert( p );
+ p2 = strstr( p, "://" );
+ /* strstr can return p if zero length string */
+
+ scheme_len = p2 > p ? p2 - p : 0;
+
+ /* if the possible scheme is only one character it is not valid */
+ /* also we could have something like C://file.txt */
+ if( scheme_len <= 1 )
return false;
- ++p;
- /* Followed by 0 or more scheme chars. */
- while (*p && SCHEME_CHAR (*p))
- ++p;
- /* Terminated by ':'. */
- return *p == ':';
+
+ for( i = 0; i < scheme_len; ++i, ++p )
+ {
+ if( !SCHEME_CHAR( *p ) )
+ return false;
+ }
+
+ return true;
}
int