I read these self-describing, extensible points in the context of EDN,
which has a syntax/wire format for some types- maps, strings, etc- and also
has an extensibility syntax:

#myapp/Person {:first "Fred" :last "Mertz"}

These tagged elements are "extensions" because they allow values of types
not known to EDN to be included in the stream, and are "self-describing" in
two senses:

* if a wire format reader does know how to create a myapp/Person{}, that
blob of data contains all the information needed to do so
* if a wire format reader doesn't known how to create a myapp/Person, it
can still read past this particular element in the stream, because tags
have a defined envelope, so a reader can figure out where data comprising
this element ends

The JSON example is mostly about the "extensibility" attribute. JSON's
format natively supports some types (like strings) but not others (like
dates), and for those others, JSON's format does not include a way to
"bucket" or "envelope" data comprising those unknown types. So JSON is not
extensible.

The google example is mostly about the "self-describing" attribute, and to
my mind is more accurately framed as a statement about the Internet as a
whole. Hypothetically, if all data exchange occurred using data formats
whose details were private arrangements between writers and readers- for
instance, all servers only spoke ProtocolBuffers and used a different
schema for each client- there would be no Internet at all, much less a
google who as a third party is able to broadly read and understand data
made available by servers. (Or, to your point, any ability to parse
anything useful from a server data stream by clients lacking knowledge of
the schema would be at best be inferential and heuristic- possible, but
infeasible on a large scale.)

With all that said- my read is that Rich bundled those two points together
in the JSON date example- JSON doesn't have an extensibility syntax to
support dates, but people still have to transmit dates over JSON, so how do
they do that? One way is by adopting a  "convention", which in some ways is
better than an out of band schema, because, as you say, a convention gives
a reader additional information to heuristically interpret the stream, but
in other ways is worse because it isn't consistent- some people will want
date fields to look like "dateModified", others will want "modifiedDate",
and others use "modificationDatetime".

So in a broad sense, it is not desirable to use a data format that does not
include an extensibility capability which itself is self-describing,
because a format that lacks extensibility creates a combinatorial explosion
in conventions to convey values not known to the format, and extensions
that are not self-describing require out of band agreements between readers
and writers that can preclude the scalable third-party interoperability
that is so important to the Internet.

Hope that helps.


On Sat, Jan 18, 2014 at 6:08 PM, Brian Craft <craft.br...@gmail.com> wrote:

> Ok, so consider a different system (besides google) that handles the JSON
> example. If it has no prior knowledge of the date field, of what use is it
> to know that it's a date? What is a situation where a system reading the
> JSON needs to know a field is a date, but has no idea what the field is for?
>
>
> On Saturday, January 18, 2014 1:27:31 PM UTC-8, Jonas wrote:
>>
>> IIRC in that particular part of the talk he was specifically talking
>> about (non-self describing) protocol buffers and not JSON.
>>
>> On Saturday, January 18, 2014 10:00:09 PM UTC+2, Brian Craft wrote:
>>>
>>> Regarding Rich's talk (http://www.youtube.com/watch?v=ROor6_NGIWU), can
>>> anyone explain the points he's trying to make about self-describing and
>>> extensible data formats, with the JSON and google examples?
>>>
>>> He argues that google couldn't exist if the web depended on out-of-band
>>> schemas. He gives as an example of such a schema a JSON encoding where an
>>> out-of-band agreement is made that field names with substring "date" refer
>>> to string-encoded dates.
>>>
>>> However, this is exactly the sort of thing google does. It finds dates,
>>> and other data types, heuristically, and not through the formats of the web
>>> being self-describing or extensible.
>>>
>>>
>>> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to