I read these self-describing, extensible points in the context of EDN, which has a syntax/wire format for some types- maps, strings, etc- and also has an extensibility syntax:
#myapp/Person {:first "Fred" :last "Mertz"} These tagged elements are "extensions" because they allow values of types not known to EDN to be included in the stream, and are "self-describing" in two senses: * if a wire format reader does know how to create a myapp/Person{}, that blob of data contains all the information needed to do so * if a wire format reader doesn't known how to create a myapp/Person, it can still read past this particular element in the stream, because tags have a defined envelope, so a reader can figure out where data comprising this element ends The JSON example is mostly about the "extensibility" attribute. JSON's format natively supports some types (like strings) but not others (like dates), and for those others, JSON's format does not include a way to "bucket" or "envelope" data comprising those unknown types. So JSON is not extensible. The google example is mostly about the "self-describing" attribute, and to my mind is more accurately framed as a statement about the Internet as a whole. Hypothetically, if all data exchange occurred using data formats whose details were private arrangements between writers and readers- for instance, all servers only spoke ProtocolBuffers and used a different schema for each client- there would be no Internet at all, much less a google who as a third party is able to broadly read and understand data made available by servers. (Or, to your point, any ability to parse anything useful from a server data stream by clients lacking knowledge of the schema would be at best be inferential and heuristic- possible, but infeasible on a large scale.) With all that said- my read is that Rich bundled those two points together in the JSON date example- JSON doesn't have an extensibility syntax to support dates, but people still have to transmit dates over JSON, so how do they do that? One way is by adopting a "convention", which in some ways is better than an out of band schema, because, as you say, a convention gives a reader additional information to heuristically interpret the stream, but in other ways is worse because it isn't consistent- some people will want date fields to look like "dateModified", others will want "modifiedDate", and others use "modificationDatetime". So in a broad sense, it is not desirable to use a data format that does not include an extensibility capability which itself is self-describing, because a format that lacks extensibility creates a combinatorial explosion in conventions to convey values not known to the format, and extensions that are not self-describing require out of band agreements between readers and writers that can preclude the scalable third-party interoperability that is so important to the Internet. Hope that helps. On Sat, Jan 18, 2014 at 6:08 PM, Brian Craft <craft.br...@gmail.com> wrote: > Ok, so consider a different system (besides google) that handles the JSON > example. If it has no prior knowledge of the date field, of what use is it > to know that it's a date? What is a situation where a system reading the > JSON needs to know a field is a date, but has no idea what the field is for? > > > On Saturday, January 18, 2014 1:27:31 PM UTC-8, Jonas wrote: >> >> IIRC in that particular part of the talk he was specifically talking >> about (non-self describing) protocol buffers and not JSON. >> >> On Saturday, January 18, 2014 10:00:09 PM UTC+2, Brian Craft wrote: >>> >>> Regarding Rich's talk (http://www.youtube.com/watch?v=ROor6_NGIWU), can >>> anyone explain the points he's trying to make about self-describing and >>> extensible data formats, with the JSON and google examples? >>> >>> He argues that google couldn't exist if the web depended on out-of-band >>> schemas. He gives as an example of such a schema a JSON encoding where an >>> out-of-band agreement is made that field names with substring "date" refer >>> to string-encoded dates. >>> >>> However, this is exactly the sort of thing google does. It finds dates, >>> and other data types, heuristically, and not through the formats of the web >>> being self-describing or extensible. >>> >>> >>> -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.