Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock Fri, 06 Dec 2013 11:50:48 -0800

On Dec 5, 2013, at 11:34 PM, Carsten Bormann wrote:

> Allen,
> 
> thank you a lot for this elaborate response.  It really helps me understand 
> the different points of view that have emerged.  I’ll go ahead and insert my 
> personal point of view in this response to maybe make that more 
> understandable as well (I’m not speaking for the JSON WG at all here, of 
> course).  Maybe you can relay this to es-discuss so both mailing lists 
> benefit from it.


Did your reply bounce from es-discuss?  I won't elide any of you comments 
below, just in case.

if you or anybody else know of actual bugs or ambiguities in ECMA-404 the best 
way to communicate that to TC39 and the ECMA-404 project editor is to open a 
ticket at bugs.ecmascript.org.  Product: "ECMA-404  JSON", Component: "1st 
Edition".

> 
>>> The syntax just tells you which sequences of symbols are part of the 
>>> language.
>>> (This is what we have ABNF for; ECMA-404 uses racetracks plus some English 
>>> language that maps the characters to the tokens in the syntax-level 
>>> racetracks for value, object, and array, and to the English language 
>>> components of the token-level racetracks for number and string.)
>> 
>> Agreed.  I would state this as:  the syntax tells you which sequences of 
>> symbols form valid statements within the language.
>> 
>> Language designer also use the term "static semantics".  The static 
>> semantics of a language are a set of rules that further restrict  which 
>> sequences of symbols form valid statements within the language. 
> 
> Right, the "static semantics" is used to form a subset of what we arbitrarily 
> call “syntax”, further restricting what sequence of symbols is in the 
> language. 
> 
>> For example, a rule that the 'member' names must be disjoint within an 
>> 'object' production could be a static semantic rule (however, there is 
>> intentionally no such rule in ECMA-404).
> 
> Thanks, it is interesting to hear that this was a deliberate omission.
> 
>> The line between syntax and static semantics can be fuzzy.  Static semantic 
>> rules are typically used to express rules that cannot be technically 
>> expressed using the chosen syntactic formalism or rules which are simply 
>> inconvenient to express using that formalism.  For example, the editor of 
>> ECMA-404 chose to simplify the RR track expression of the JSON syntax by 
>> using static semantic rules for whitespace rather than incorporating them 
>> into RR diagrams. 
> 
> No, that isn’t static semantics.  The racetracks don’t have a useful meaning 
> (i.e., express a different, more restricted syntax) without the English 
> language rules about whitespace,  (More specifically, three of the racetracks 
> operate on a different domain than the other two, without that having made 
> explicit.)  Static semantics can only serve to restrict the set of 
> syntactically valid symbol sequences.  Accepting whitespace is on the syntax 
> level.  (Then ignoring it is indeed semantics.)

I think we're quibble here about unimportant points. Multiple level 
specifications is a common practice for language specification.  For example, 
using regular expressions to define the lexical productions (the tokens) and a 
BNF grammar to define the syntactic level. It is also common practice to use 
prose to describe the role of whitespace at the lexical level.  For example see 
http://www.ecma-international.org/ecma-262/5.1/#sec-5.1.2 

The important point is whether or not ECMA-404 under specifies the language, is 
ambiguous, or has any other errors. If it does, please file bug reports so 
corrections can be made in a revised editions. 

> 
>> Another form of static semantic rules are equivalences that state when two 
>> or more different sequences of symbols must be considered as equivalent.  
>> For example, the rules that state equivalencies between escape sequences and 
>> individual code points within an JSON 'string'.  Such equivalences are not 
>> strictly necessary at this level, but it it simplifies the specification of 
>> higher level semantics if equivalent symbol sequences can be normalized at 
>> this level of specification.
> 
> It may be convenient to lump this under static semantics (the static 
> semantics may need to rely on such rules), but we are now in the area of 
> semantic interpretation, no longer in the area of what should be strictly 
> syntax but has been split into “syntax" and "static semantics" for notational 
> convenience.

I disagree. It is useful at the syntactic/static semantic level to specify that 
two symbol sequences must be equivalent for semantic purposes.  And we can do 
this without providing any actual semantics for the symbol sequences.  For 
example:

We can say that
   "abc"
and
   "\u0061\u0062\u0063"
must be assigned identical semantics without actually specify what that 
semantics is. Whether you prefer to call it static semantics or something else, 
it is independent of any specific semantic domain and reasonably at the level 
of concerns addressed by ECMA-404.

> 
>> When we talk about the "semantics" of a language (rather than "static 
>> semantics") we are talking about attributing meaning (in some domain and 
>> context) to well-formed (as specified via syntax and static semantics) 
>> statements expressed in that language.
> 
> Exactly.
> 
>> ECMA-404 intentionally restricts itself to specify the syntax and static 
>> semantics of the JSON language.  More below on why.
> 
> If that was the intention, that didn’t work out too well.

specific bugs please...

> 
>>> Semantics is needed to describe e.g. that some whitespace is 
>>> “insignificant” (not contributing to the semantics), describe the intended 
>>> interpretation of escape sequences in strings,
>> Yes these are static semantic rules (although whitespace rules could be 
>> expressed using syntactic formalisms).
> 
> The syntax allows the whitespace.  The semantics tells you it doesn’t make a 
> difference with respect to the meaning.  (OK, if you lump in semantic 
> equivalence under static semantics, you can say the above, but this muddies 
> the terms.)
> 
>>> that the sequences of symbols enabled by the production “number” are to be 
>>> interpreted in base 10,
>> Yes, ECMA-404 includes this as a static semantic statement although it is 
>> arguably could be classified as a semantic statement above the level of 
>> static semantics.  Whether "77" is semantically interpreted as the 
>> mathematical value 63 or 77 isn't really relevant to whether "77" is a 
>> well-formed JSON number.
> 
> ECMA-404 indeed does not provide the full semantics of its numbers, just 
> saying that they are “represented in base 10”, appealing to a deeply rooted 
> common understanding of what that means (which by the way has been codified 
> in ECMA-63 and then ISO 6093).  Note that there is no meaning of “represented 
> in” outside of the domain of semantics — the text clearly is about mapping 
> the abstract (semantic) concept of a number to its base-10 representation 
> using JSON’s syntax.  It seems that this phrasing is a remnant from a time 
> when the semantics was intended to be part of the specification.

"represented in base 10" probably would be better stated as "represented as a 
sequence of decimal digits" which would eliminate the semantic implication. 

Yes, there are remnants in ECMA-404 (and in REF-6427bis) from the days when the 
JSON format and its language binding to ECMAScript tended to be equated. One of 
the things we should be trying to do is eliminate those remnants.

> \
>>> or that “the order of the values is significant” in arrays (which seems to 
>>> be intended to contrast them to JSON objects, where ECMA-404 weasels out of 
>>> saying whether the order is significant).
>> 
>> ECMA-404 removed the statement "an object is an unordered collection..." 
>> that exists in RFC-6427.  
> 
> Indeed, it is again interesting to note that this was an intentional change 
> from the existing JSON specifications.
> 
>> Arguably, ECMA-404 should not have made the statement "the the order of 
>> values is significant" WRT arrays.  I'll file a bug ticket on that.  The 
>> reason that neither of these statements is appropriate at this level of 
>> specification is that they are pure semantic statements that have no impact 
>> upon determining whether a sequence of symbols are well-formed JSON text.
> 
> Well, in your definition of static semantics that includes semantic 
> equivalence, the statement is appropriate.  It is, however, somewhat random 
> whether ECMA-404 provides statements about semantic equivalence or not; it is 
> certainly not trying for any completeness.

More specifics please.  I don't see how semantic equivalence enters into this 
discussion of arrays. What equivalences comes into play?  As I said above, I 
think the existence of that phase "the order of values is significant" is a 
bug.  "significant" to what?  Certainly the intent wasn't to forbid a schema 
level semantics from considering [1,2] and [2,1] as being equivalent in some 
particular field position.

> 
>> Objectively, the members of a JSON 'object' do occur in a specific order and 
>> a semantic interpreter of an object might ascribe meaning to that ordering.  
>> Similarly, a JSON 'array' also has an objectively observable ordering of its 
>> contained values. It is again up to a semantic interpreter as to whether or 
>> not it ascribes meaning to that ordering.
> 
> It is also up to a semantic interpreter as to whether it interprets base-10 
> numbers from left to right or from right to left.  However, I would argue 
> that some of the potential interpretations are violating the principle of 
> least surprise.  More so, JSON in the real world benefits from a significant 
> amount of common interpretation.

Agreed.  I believe we have this today, at this level.

> A reasonable way to capture this at the specification level is to define a 
> generic “JSON data model” and define the semantic processing that leads up to 
> this, but then of course leave it up to the application how to interpret the 
> elements of the JSON data model.  A JSON array would be delivered as an 
> ordered sequence to the (application-independent) JSON data model, but the 
> application could still interpret that information as a set or as a record 
> structure, depending on application context.

What do you mean by "delivered" in your second sentence.  It sounds like you 
are either talking about a language binding or perhaps a JSON parser interface. 
The former is clearly in the realm that I classify as semantics and I would 
expect any reasonable parser-based interface to preserve all ordering 
relationships that exist in the parsed text.  

As another example, the JSON to ECMAScript language binding defined by ECMA-262 
implicitly defines an ordering of the properties of the ECMAScript objects that 
are created corresponding to JSON objects even though RFC-6427 said that an 
array is an unordered set of values.  It just falls out of the ECMAScript data 
model. 

We could try to say that all semantics applied to the JSON format MUST preserve 
the ordering of JSON array elements. But is seems unnecessary and in some cases 
excessively restrictive.

Defining a complete and universal "JSON data model" is hard.  It is possible to 
defined a normative JSON syntax without providing such a model and that is the 
direction that ECMA-404 has taken. If somebody wants to attempt to define such 
a data model they are welcome to write a spec. layered above ECMA-404 and to 
demonstrate its utility. 

In practice, JSON is almost useless without schema level semantic agreement 
between the producer and consumer of a JSON text. Most of the issues we are 
discussing here are easily subsumed by such schema level agreements.

> 
>>> ECMA-404 does quite a bit of of the latter, so indeed I also have trouble 
>>> interpreting such a statement.
>> 
>> So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to avoid 
>> describing JSON beyond the level of static semantics. 
>> 
>> ECMA-404 see JSON as "a text format that facilitates structured data 
>> interchange between all programming languages. JSON
>> is syntax of braces, brackets, colons, and commas that is useful in many 
>> contexts, profiles, and applications”.
> 
> I hope by now it should be clear that ECMA-404 is neither very successful in 
> focusing on the syntax only, nor is it a particularly good specification of 
> that syntax due to its mix of English language and racetrack graphics.  (I 
> still like it as a tutorial for the syntax.)

No, it isn't clear.  Specific bugs would help clarify. I didn't choose to use 
racetracks in the specification and I might not have made that choice myself. 
But I will defend it as a valid formalism and one that is well understood.  A 
grammar expressed using ovals and arrows is just as valid as one expressed 
using ASCII characters. 

It's silly to be squabbling over such a notational issues and 
counter-productive if such squabbles results multiple different normative 
standards for the same language/format.

TC39 would likely be receptive to a request to add to ECMA-404 an informative 
annex with a BNF grammar for JSON (even ABNF, even though it isn't TC39's 
normal BNF conventions). Asking is likely to produce better results than 
throwing stones.

> 
>> There are many possible semantics and categories of semantics that can be 
>> applied to well-formed statements expressed using the JSON syntax.
> 
> The problem with this approach is that much of the interoperability of JSON 
> stems from implementations having derived a common data model.  Some of this 
> is in the spec (RFC 4627), some if it has been derived by implementers 
> drawing analogies with its ancestor JavaScript, some of it stems from the 
> fact that the syntax is simply suggestive of a specific data model.
> 
> Much more would be gained in documenting that (common) data model (including 
> documenting the differences that have ensued, and selecting some deviations 
> as canonical and others as mistakes) than from retracting some of the 
> explicit semantics (while keeping some of them as well as the implicit ones 
> weakly in place).

This is where I disagree.  Do you have any examples of interoperability 
problems occurring at this level? As I said above.  Successful JSON 
interoperability is most dependent upon schema level semantic agreement and 
good language bindings. In practice, those levels easily can encompass the sort 
of data model issues you seem to be concerned about.

However, I don't think TC39 wants to put any barriers in front of somebody 
trying to specify such a data model or models.  We tried to avoid such barriers 
is by not including unnecessary semantic restrictions in ECMA-404.

> 
>> One type of semantics are language bindings that specify how a JSON text 
>> might be translated into the data types and structures of some particular 
>> programming language or runtime environment. The translation of a JavaScript 
>> string encoding of a JSON text into JavaScript objects and values by 
>> JSON.parse is one specific example of this kind of semantic application of 
>> JSON.  But there are many languages that can be supported by such language 
>> bindings and there is not necessarily a best or canonical JSON binding for 
>> any language.
> 
> Leaving out the common data model and going directly from syntax to language 
> binding is a recipe for creating interoperability problems.  The approach 
> “works” from the view of a single specific language (and thus may seem 
> palatable for a group of experts in a specific language, such as TC39), but 
> it is not aiding in interoperability of JSON at large.

Examples? The language expertise within TC39 certainly extends beyond just 
ECMAScript and that expertise informs the consensus decisions we make.

A counter example we have actually discussed is any limitation on the number of 
digits in a JSON number.   While some applications of JSON might want to limit 
the precision other have a need for arbitrary large digit sequences.  Such 
restrictions and allowances must be dealt with at the schema specification 
level so there is no need to arbitrarily restrict precision at the 
format/language level of specification.

That's it for now. I think I've already addressed any substantive issues you 
raise below.

Happy to continue the conversation. 

Allen

> 
>> Another form of semantics imposes schema based meaning and restrictions upon 
>> a well-formed JSON text.  A schema explicitly defines an application level  
>> meaning to the elements for some specific subset of well-formed SON texts. 
>> It might require only certain forms of JSON values, provide specific meaning 
>> to JSON numbers or strings that occur in specified positions, require the 
>> occurrence of certain object members, apply meaning to the ordering of 
>> object members or array elements, etc. This is probably  most common form of 
>> semantics applied to JSON and is used by almost all real world JSON use 
>> cases.
> 
> Again, leaving out the common data model and leaping from the syntax to a 
> specific application semantics negates all the real-world advantages of 
> having a common data interchange format.
> 
>> The problem with trying to standardize JSON semantics is that the various 
>> semantics that can be usefully be imposed upon JSON are often mutually 
>> incompatible with each other. At a trivial level, we see this with issues 
>> like the size of numbers or duplicate object member keys.  It is very hard 
>> to decide whose semantics are acceptable and whose is not.
> 
> Completely agree.  The next step in the evolution of JSON should have been to 
> actually do this hard work based on the experience we have after a decade of 
> usage, instead of punting on it.
> 
>> What we can do, is draw a bright-line just above the level of static 
>> semantics.This is what ECMA-404 attempts to do. If defines a small set of 
>> structuring elements that can be recursively composed and represent in a 
>> textual encoding. It provides a common vocabulary upon which various 
>> semantics can be overlaid and nothing else.  The intent of ECMA-404 is to 
>> provide the definitive specification of the syntax and static semantic of 
>> the JSON format that can be used by higher level semantic specifications.
> 
> It might have been a good idea to do just that as a first step, but rushing 
> out ECMA-404 with little feedback from the wider community has apparently 
> compromised the quality of the result.  As it stands, the seven-year old RFC 
> 4627 continues to fulfill this very objective in a better way.
> 
> Grüße, Carsten
> 
> _______________________________________________
> json mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/json
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: [Json] Response to Statement from W3C TAG

Reply via email to