On Sat, Aug 06, 2022 at 03:20:15PM +0100, Gavin Smith wrote:
> Characters should be protected if they are not part of the syntax of the URL
> but they could be.
>
> Maybe more readable than the WHATWG documentation:
> https://www.rfc-editor.org/rfc/rfc3986#page-12
>
> This gives a list of reserved characters, of which there a quite a few.
> (It's likely that not all of them occur in Texinfo output.)
Why not? In an @uref, the user may well put anything, possible using %
encoded or % unencoded text.
> So if an image filename has a colon in it, that colon should be encoded
> in the href attribute, but a colon that follows the protocol (http:) should
> not be encoded, as you say. Perhaps the percent encoding algorithm could
> be performed on a subset of the URL, rather than taking a URL string and
> percent encoding throughout.
Indeed, I also figured out that image files, that we know are file names
and not true url should have much more protection. What I will commit
will have everything percent encoded, except for / and :, as : can be a
drive letter in windows. I do not know about other separators that
could be used in file names.
> The treatment of @url/@uref could be different, as you say. The user provides
> the entire URL in the source document. Arguably it is up to the user to
> percent encode appropriately within the URL, and non-ASCII bytes inside the
> argument are a risk that the user has made as to whether they are valid or
> not.
In that case, and for @email too, I settled on percent encoding
'lightly', non ascii characters, } {, spaces and not much more
keeping the character that can happen in urls as you describe above, as
well as %.
--
Pat