On Friday, August 17, 2001, at 01:07 , Sven Neuhaus wrote:
> Are mailto: and news: URLs official? If so they're missing..
I don't know of any "official" list of valid schemes in URLs.
RFC 2396 ("Uniform Resource Identifiers (URI): Generic Syntax")
has this to say:
3. URI Syntactic Components
The URI syntax is dependent upon the scheme. In general, absolute
URI are written as follows:
<scheme>:<scheme-specific-part>
An absolute URI contains the name of the scheme being used (<scheme>)
followed by a colon (":") and then a string (the <scheme-specific-
part>) whose interpretation depends on the scheme.
The URI syntax does not require that the scheme-specific-part have
any general structure or set of semantics which is common among all
URI. However, a subset of URI do share a common syntax for
representing hierarchical relationships within the namespace. This
"generic URI" syntax consists of a sequence of four main components:
<scheme>://<authority><path>?<query>
each of which, except <scheme>, may be absent from a particular URI.
It goes on further to define "scheme" as matching the regex /^[a-
z][a-z0-9+.-]+$/i.
So, it appears that in URIs and (as a subset) URLs, not only are
"mail" and "news" valid schemes, but so are an infinity of other
strings. Furthermore, without constraining the original problem
(find URLs in a string), it seems that a multitude of false
positives will be generated.
--
Craig S. Cottingham
[EMAIL PROTECTED]
PGP key available from:
<http://pgp.ai.mit.edu:11371/pks/lookup?op=get&search=0xA2FFBE41>
ID=0xA2FFBE41, fingerprint=6AA8 2E28 2404 8A95 B8FC 7EFC 136F
0CEF A2FF BE41