Hi,

On 10/2/23 10:55, Bachir Bendrissou wrote:
Hi,

The following url example contains a semicolon in the userinfo segment:


*http://a <http://a>;b:c@xyz*
Wget rejects this url with the following error message:

*http://a <http://a>;b:c@xyz: Bad port number.*

It seems that Wget sees "c" as a port number. When "c" is replaced by a
digit, Wget accepts the url and attempts to resolve "xyz".

Wget doesn't follow the current specs and the parsing is lenient to accept some types of badly formatted URLs seen in the wild.

But we should possibly become more strict and compliant to current specs.


It's worth noting that curl and aria2 both accept the url example.

My  version of curl (8.3.0) doesn't accept it:

curl -vvv 'http://a <http://a>;b:c@xyz'
* URL rejected: Malformed input to a URL function
* Closing connection
curl: (3) URL rejected: Malformed input to a URL function

All the URL parsers are slightly different when it comes to edge cases.
I'd consider curl as a good reference.

Why is the semicolon not allowed in userinfo, despite that other special
characters are allowed?

First of all, userinfo does not allow spaces at all (look at https://datatracker.ietf.org/doc/html/rfc3986).
  userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
  sub-delims  = !$&'()*+,;=
  pct-encoded = "%" HEXDIG HEXDIG


Thank you,
Bachir

Regards, Tim

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to