[DNSOP] Non-ASCII names in DNS messages in JSON

Shane Kerr Fri, 22 Aug 2014 05:16:54 -0700

Paul,

On Thu, 21 Aug 2014 09:02:10 -0700
Paul Hoffman <[email protected]> wrote: 
> Andreas' and Shane's requests differ. And they both ignore the fact
> that JSON defines strings as Unicode characters, not as octets. The
> escaping "defined" in RFC 1035 does not say where it must be applied.


While we propose different solutions, the thing that you think we're
ignoring is what we're actually just working around.

Both Andreas' and my suggestions work by recognizing that ASCII is a
subset of Unicode, and requiring that DNS JSON messages use ASCII.

> Note that the definition of a string in JSON is:
> 
>       string = quotation-mark *char quotation-mark
> 
>       char = unescaped /
>           escape (
>               %x22 /          ; "    quotation mark  U+0022
>               %x5C /          ; \    reverse solidus U+005C
>               %x2F /          ; /    solidus         U+002F
>               %x62 /          ; b    backspace       U+0008
>               %x66 /          ; f    form feed       U+000C
>               %x6E /          ; n    line feed       U+000A
>               %x72 /          ; r    carriage return U+000D
>               %x74 /          ; t    tab             U+0009
>               %x75 4HEXDIG )  ; uXXXX                U+XXXX
> 
>       escape = %x5C              ; \
> 
>       quotation-mark = %x22      ; "
> 
>       unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

Yes, yes.

> Given this, and the likelihood that escaping is going to screw up
> NAME/QNAME exactly where it will be needed the most (to get the exact
> octets of an odd name), I think making NAME/QNAME only hold
> hostnames, and non-hostnames must be in a different field that is
> hex-encoded, will be the easiest to get right.

In Andreas' proposal a name of ^C would be written as "\\003", and in
mine it would be written as "\u0003". I'm not sure why you
think this would cause more more of a chance of error than "03".

To be honest, I think a more likely scenario is that a coder consuming
this data would not bother to look at any specifications, but build a
quick parser, which would then break every couple days as some random
packet has a "QNAME" instead of "hostQNAME" value show up. :)

Further, Andreas' and my proposal both have the nice property that an
SRV lookup would appear like "_sip._tcp.example.com", instead of 
"735f70695f2e63742e7078656d616c702e656f636d".

One slight advantage of my proposal over Andreas' is that a consumer
would likely not have to do anything fancy to read the data as it was
in the original message. (A producer might have to do some gymnastics
to insure that %x7F to %xFF are output properly, depending on how the
messages are generated, I suppose.)

To sum up, I think that if you're going to the bother of transforming
DNS messages into some vaguely human-readable format, you should try to
make it as readable as possible.

Cheers,

--
Shane

_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Non-ASCII names in DNS messages in JSON

Reply via email to