Re: [DNSOP] Character encoding of URI Target RDATA?

Patrik Fältström Tue, 16 Jun 2015 19:07:42 -0700

On 16 Jun 2015, at 22:45, Robert Edmonds wrote:

John Levine wrote:
What I'm asking is how the octet sequences provided by the URI RRRFCare decoded into the sequences of URI characters used by the URIRFC.
Is there a generic way to do this, or does it depend on the specific
protocol (e.g., HTTP), or is it left up to the application?
As far as I can see, RFC 3986 defines URIs as sequences of ASCII
characters.  In the few places where they mention non-ASCII material,
it says to represent them as percent encoded UTF-8, so it's still all
ASCII.
OK.  That RFC seems to distance itself from mere octets.
Can you give an example of URI RDATA where it would make sense to
interpret it other than as ASCII?
This is the FTP example from the URI RR RFC, to which the UTF-8 byteorder mark has been gratuitously added:


Hmm...what RFC are you referring to? I can not find this in RFC 7553.

TYPE256 \# 36000a0001efbbbf6674703a2f2f667470312e6578616d706c652e636f6d2f7075626c6963
or, equivalently,

 URI 10 1 "\239\187\191ftp://ftp1.example.com/public";
Attempting to decode it as ASCII simply does the wrong thing, but Idon't see any reason that it's not a valid URI RR, and, knowing thatit's encoded as UTF-8 w/ BOM, it can be successfully parsed into a URI(provided the Target field is handed off to the URI-parsingapplication as raw bytes, and not as a string with DNS zone file \DDDstyle escapes).


The RFC says this:

This field holds the URI of the target, enclosed in double-quote
characters ('"'), where the URI is as specified in RFC 3986
[RFC3986].  Resolution of the URI is according to the definitions for
the Scheme of the URI.

I suppose to be perfectly clear we might either say "percent encode
everything" or we might say "unencoded UTF-8 is allowed."  They're
both unambigious, and I expect most parsers can handle both.
It would be very nice indeed if application developers did not have toguess at the encoding of the bytes.

Earlier versions of the I-D did say explicitly that UTF-8 encodedcharacters is how the Target is to be interpreted, but feedback gavethat it is better to just reuse the same specification as URIs. I.e. theinterpretation is according to RFC 3986 (which implies unclear where3986 might be unclear).


   Patrik

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Character encoding of URI Target RDATA?

Reply via email to