On 16 Jun 2015, at 22:45, Robert Edmonds wrote:

John Levine wrote:
What I'm asking is how the octet sequences provided by the URI RR RFC are decoded into the sequences of URI characters used by the URI RFC.
Is there a generic way to do this, or does it depend on the specific
protocol (e.g., HTTP), or is it left up to the application?

As far as I can see, RFC 3986 defines URIs as sequences of ASCII
characters.  In the few places where they mention non-ASCII material,
it says to represent them as percent encoded UTF-8, so it's still all
ASCII.

OK.  That RFC seems to distance itself from mere octets.

Can you give an example of URI RDATA where it would make sense to
interpret it other than as ASCII?

This is the FTP example from the URI RR RFC, to which the UTF-8 byte order mark has been gratuitously added:

Hmm...what RFC are you referring to? I can not find this in RFC 7553.

TYPE256 \# 36 000a0001efbbbf6674703a2f2f667470312e6578616d706c652e636f6d2f7075626c6963

or, equivalently,

 URI 10 1 "\239\187\191ftp://ftp1.example.com/public";

Attempting to decode it as ASCII simply does the wrong thing, but I don't see any reason that it's not a valid URI RR, and, knowing that it's encoded as UTF-8 w/ BOM, it can be successfully parsed into a URI (provided the Target field is handed off to the URI-parsing application as raw bytes, and not as a string with DNS zone file \DDD style escapes).

The RFC says this:

This field holds the URI of the target, enclosed in double-quote
characters ('"'), where the URI is as specified in RFC 3986
[RFC3986].  Resolution of the URI is according to the definitions for
the Scheme of the URI.

I suppose to be perfectly clear we might either say "percent encode
everything" or we might say "unencoded UTF-8 is allowed."  They're
both unambigious, and I expect most parsers can handle both.

It would be very nice indeed if application developers did not have to guess at the encoding of the bytes.

Earlier versions of the I-D did say explicitly that UTF-8 encoded characters is how the Target is to be interpreted, but feedback gave that it is better to just reuse the same specification as URIs. I.e. the interpretation is according to RFC 3986 (which implies unclear where 3986 might be unclear).

   Patrik

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to