On Dec 4, 2013, at 11:39 PM, Carsten Bormann wrote:
> On 05 Dec 2013, at 06:08, Tim Bray <[email protected]> wrote:
>
>> FWIW, I have never understood what the ECMAnauts mean by the word
>> “semantics” in this context, so I have no idea whether I agree with this
>> statement.
As one of the contributors to ECMA-404 I'd be happy to elaborate
>
> You know this, but just for the record: we could be applying the meaning we
> have for these terms in CS.
Yes, that is indeed the starting point. However, TC39 is largely composed of
language designers and language implementors so the meaning of "semantics" we
use is generally the one used within that branch of CS.
>
> The syntax just tells you which sequences of symbols are part of the language.
> (This is what we have ABNF for; ECMA-404 uses racetracks plus some English
> language that maps the characters to the tokens in the syntax-level
> racetracks for value, object, and array, and to the English language
> components of the token-level racetracks for number and string.)
Agreed. I would state this as: the syntax tells you which sequences of
symbols form valid statements within the language.
Language designer also use the term "static semantics". The static semantics
of a language are a set of rules that further restrict which sequences of
symbols form valid statements within the language. For example, a rule that
the 'member' names must be disjoint within an 'object' production could be a
static semantic rule (however, there is intentionally no such rule in ECMA-404).
The line between syntax and static semantics can be fuzzy. Static semantic
rules are typically used to express rules that cannot be technically expressed
using the chosen syntactic formalism or rules which are simply inconvenient to
express using that formalism. For example, the editor of ECMA-404 chose to
simplify the RR track expression of the JSON syntax by using static semantic
rules for whitespace rather than incorporating them into RR diagrams.
Another form of static semantic rules are equivalences that state when two or
more different sequences of symbols must be considered as equivalent. For
example, the rules that state equivalencies between escape sequences and
individual code points within an JSON 'string'. Such equivalences are not
strictly necessary at this level, but it it simplifies the specification of
higher level semantics if equivalent symbol sequences can be normalized at this
level of specification.
When we talk about the "semantics" of a language (rather than "static
semantics") we are talking about attributing meaning (in some domain and
context) to well-formed (as specified via syntax and static semantics)
statements expressed in that language.
ECMA-404 intentionally restricts itself to specify the syntax and static
semantics of the JSON language. More below on why.
>
> Semantics is needed to describe e.g. that some whitespace is “insignificant”
> (not contributing to the semantics), describe the intended interpretation of
> escape sequences in strings,
Yes these are static semantic rules (although whitespace rules could be
expressed using syntactic formalisms).
> that the sequences of symbols enabled by the production “number” are to be
> interpreted in base 10,
Yes, ECMA-404 includes this as a static semantic statement although it is
arguably could be classified as a semantic statement above the level of static
semantics. Whether "77" is semantically interpreted as the mathematical value
63 or 77 isn't really relevant to whether "77" is a well-formed JSON number.
> or that “the order of the values is significant” in arrays (which seems to be
> intended to contrast them to JSON objects, where ECMA-404 weasels out of
> saying whether the order is significant).
ECMA-404 removed the statement "an object is an unordered collection..." that
exists in RFC-6427. Arguably, ECMA-404 should not have made the statement "the
the order of values is significant" WRT arrays. I'll file a bug ticket on
that. The reason that neither of these statements is appropriate at this level
of specification is that they are pure semantic statements that have no impact
upon determining whether a sequence of symbols are well-formed JSON text.
Objectively, the members of a JSON 'object' do occur in a specific order and a
semantic interpreter of an object might ascribe meaning to that ordering.
Similarly, a JSON 'array' also has an objectively observable ordering of its
contained values. It is again up to a semantic interpreter as to whether or not
it ascribes meaning to that ordering.
>
> ECMA-404 does quite a bit of of the latter, so indeed I also have trouble
> interpreting such a statement.
So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to avoid
describing JSON beyond the level of static semantics.
ECMA-404 see JSON as "a text format that facilitates structured data
interchange between all programming languages. JSON
is syntax of braces, brackets, colons, and commas that is useful in many
contexts, profiles, and applications".
There are many possible semantics and categories of semantics that can be
applied to well-formed statements expressed using the JSON syntax.
One type of semantics are language bindings that specify how a JSON text might
be translated into the data types and structures of some particular programming
language or runtime environment. The translation of a JavaScript string
encoding of a JSON text into JavaScript objects and values by JSON.parse is one
specific example of this kind of semantic application of JSON. But there are
many languages that can be supported by such language bindings and there is not
necessarily a best or canonical JSON binding for any language.
Another form of semantics imposes schema based meaning and restrictions upon a
well-formed JSON text. A schema explicitly defines an application level
meaning to the elements for some specific subset of well-formed SON texts. It
might require only certain forms of JSON values, provide specific meaning to
JSON numbers or strings that occur in specified positions, require the
occurrence of certain object members, apply meaning to the ordering of object
members or array elements, etc. This is probably most common form of semantics
applied to JSON and is used by almost all real world JSON use cases.
The problem with trying to standardize JSON semantics is that the various
semantics that can be usefully be imposed upon JSON are often mutually
incompatible with each other. At a trivial level, we see this with issues like
the size of numbers or duplicate object member keys. It is very hard to decide
whose semantics are acceptable and whose is not.
What we can do, is draw a bright-line just above the level of static
semantics.This is what ECMA-404 attempts to do. If defines a small set of
structuring elements that can be recursively composed and represent in a
textual encoding. It provides a common vocabulary upon which various semantics
can be overlaid and nothing else. The intent of ECMA-404 is to provide the
definitive specification of the syntax and static semantic of the JSON format
that can be used by higher level semantic specifications.
Allen Wirfs-Brock
ECMA-262 project editor
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss