2012/3/12 Jeremy Shaw <[email protected]>: > On Sun, Mar 11, 2012 at 1:33 PM, Jason Dusek <[email protected]> wrote: >> Well, to quote one example from RFC 3986: >> >> 2.1. Percent-Encoding >> >> A percent-encoding mechanism is used to represent a data octet in a >> component when that octet's corresponding character is outside the >> allowed set or is being used as a delimiter of, or within, the >> component. > > Right. This describes how to convert an octet into a sequence of characters, > since the only thing that can appear in a URI is sequences of characters. > >> The syntax of URIs is a mechanism for describing data octets, >> not Unicode code points. It is at variance to describe URIs in >> terms of Unicode code points. > > > Not sure what you mean by this. As the RFC says, a URI is defined entirely > by the identity of the characters that are used. There is definitely no > single, correct byte sequence for representing a URI. If I give you a > sequence of bytes and tell you it is a URI, the only way to decode it is to > first know what encoding the byte sequence represents.. ascii, utf-16, etc. > Once you have decoded the byte sequence into a sequence of characters, only > then can you parse the URI.
Mr. Shaw, Thanks for taking the time to explain all this. It's really helped me to understand a lot of parts of the URI spec a lot better. I have deprecated my module in the latest release http://hackage.haskell.org/package/URLb-0.0.1 because a URL parser working on bytes instead of characters stands out to me now as a confused idea. -- Jason Dusek pgp /// solidsnack 1FD4C6C1 FED18A2B _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
