On 16 Jun 2015, at 22:45, Robert Edmonds wrote:
John Levine wrote:
What I'm asking is how the octet sequences provided by the URI RR
RFC
are decoded into the sequences of URI characters used by the URI
RFC.
Is there a generic way to do this, or does it depend on the specific
protocol (e.g., HTTP), or is it left up to the application?
As far as I can see, RFC 3986 defines URIs as sequences of ASCII
characters. In the few places where they mention non-ASCII material,
it says to represent them as percent encoded UTF-8, so it's still all
ASCII.
OK. That RFC seems to distance itself from mere octets.
Can you give an example of URI RDATA where it would make sense to
interpret it other than as ASCII?
This is the FTP example from the URI RR RFC, to which the UTF-8 byte
order mark has been gratuitously added:
Hmm...what RFC are you referring to? I can not find this in RFC 7553.
TYPE256 \# 36
000a0001efbbbf6674703a2f2f667470312e6578616d706c652e636f6d2f7075626c6963
or, equivalently,
URI 10 1 "\239\187\191ftp://ftp1.example.com/public"
Attempting to decode it as ASCII simply does the wrong thing, but I
don't see any reason that it's not a valid URI RR, and, knowing that
it's encoded as UTF-8 w/ BOM, it can be successfully parsed into a URI
(provided the Target field is handed off to the URI-parsing
application as raw bytes, and not as a string with DNS zone file \DDD
style escapes).
The RFC says this:
This field holds the URI of the target, enclosed in double-quote
characters ('"'), where the URI is as specified in RFC 3986
[RFC3986]. Resolution of the URI is according to the definitions for
the Scheme of the URI.
I suppose to be perfectly clear we might either say "percent encode
everything" or we might say "unencoded UTF-8 is allowed." They're
both unambigious, and I expect most parsers can handle both.
It would be very nice indeed if application developers did not have to
guess at the encoding of the bytes.
Earlier versions of the I-D did say explicitly that UTF-8 encoded
characters is how the Target is to be interpreted, but feedback gave
that it is better to just reuse the same specification as URIs. I.e. the
interpretation is according to RFC 3986 (which implies unclear where
3986 might be unclear).
Patrik
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop