On Aug 21, 2014, at 8:27 AM, Shane Kerr <[email protected]> wrote:

>>> * In general I'm not super enthusiastic about the mixing of binary
>>> and formatted data - I tend to think an application will want one
>>> or the other. Perhaps it makes more sense to define two formats,
>>> one binary and one formatted? Or...
>> 
>> All fields are optional, so a profile could say "don't include these"
>> or "always include those". Further, and more importantly, most RDATA
>> are binary. I did not want to force implementations to use the
>> presentation format for RDATA.
> 
> The problem with an "all fields are optional" approach is that it puts
> all the burden on the consumer of the data, right? You literally have
> no idea what to expect. (That's kind of why I proposed some sort of
> schema below.)

Correct.

> I understand not wanting to force implementations to use the
> presentation format for RDATA... OTOH it seems likely that the reason
> people are putting data in JSON is so they can see what it is. We could
> always try the RFC 3597 approach for an unknown RTYPE?

Define "unknown". Note that at least one "known" RTYPE has its description 
wrong, but you'd only know that if you read the RFC errata.

I could have RDATA is a string that is in presentation format, and hope that 
all emitters and receivers get that right, with an optional additional filed of 
rdataOctets! for a base64url-encoded version. If the emitter doesn't know the 
presentation form, it uses that only; if the emitter isn't sure that the 
receiver knows the presentation form, the emitter adds an rdataOctets! field as 
well. If the emitter knows that the RDATA is broken (such as a three-octet 
value for an A record), it just uses an rdataOctets! field. Would that suffice 
for you? 

>>> * Maybe it makes sense to define a meta-record so consumers can know
>>> what to expect? Something that lists which names will (or may)
>>> appear.
>> 
>> That would be a JSON schema. Just using that phrase will cause
>> screaming in the Apps Area. Having said that, it's perfectly
>> reasonable for a profile to insist that each record have a profile
>> indicator such as "Profile": "Private DNS interchange v3.1".
> 
> Screaming aside, applications will either have an implicit schema or an
> explicit one. Defining the problem to be out of scope may be necessary
> to get something published, but that's a symptom of IETF brokenness
> IMHO, since it reduces the usefulness of any such RFC. :(

The IETF looked at the horrid mess crated by XML schema, and decided that 
implicit schema described in text are better. And before you attack the 
messenger on this, note that (a) I tried to get the IETF to look at JSON schema 
again and (b) I'm the co-chair of the JSON WG.

>>> I'd be mildly curious to see a comparison of the compressed sizes of
>>> JSON-formatted data (without data duplicated as binary stuff) versus
>>> non-JSON-formatted data. My intuition is that compression will
>>> remove most of the horrible redundancy that is involved in JSON,
>>> but there's only one way to be sure. ;)
>> 
>> Sure. It's pretty trivial to do, for example, a CBOR format that
>> follows this; there are now CBOR libraries for most popular modern
>> languages (see http://cbor.io/). If folks here want that, I can add
>> it as an appendix. To be clear, however, I haven't heard anyone
>> saying they want compression so badly they are willing to lose
>> readability of the data.
> 
> Oh, I meant with gzip or the like, not some JSON crafted format.
> 
> So the idea is:
> 
>   $ tcpdump -w somefile.pcap
>   $ pcap2dnsjson somefile.pcap somefile.json
>   $ gzip somefile.pcap
>   $ gzip somefile.json
>   $ ls -l somefile.{pcap,json}.gz
> 
> Then compare the sizes of the compressed files.
> 
> The idea being that when moving files around via scp or rsync or
> whatever they'd probably be compressed like this, and probably also for
> medium-term storage. My hope is that a compressed JSON is roughly the
> same size as a compress raw pcap file, since basically they have the
> same entropy.
> 
> The reason I bring this up is to give a feel for the size cost of a
> bloated text format in practice. :)

Noted. That's an easy thing for implementers of this format to do. So is saying 
that size is so important (maybe because we finally have so many DNSSEC keys 
passed about) that CBOR or something like it would be better.

--Paul Hoffman
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to