I hadn't realized JSON parsers have gotten that sophisticated -- I've just
used the standard DOM and SAX parsers.

That said, I don't like the idea of returning E_SYNTAX when a field has the
wrong type (or is missing). That strikes me as very confusing to a client
programmer. As currently defined, E_SYNTAX doesn't say where the error
occurred. Odds are that the client programmer will use a generic JSON parser
to verify the JSON she sent, and that parser will say the JSON is perfectly
valid. And then programmer will go, "What the Š.?"

So I'd reserve E_SYNTAX for pure JSON syntax errors.  Use E_MISSING_FIELD or
E_INVALID_FIELD_TYPE for schema violations.

I didn't go into the details of the JSON parsers you cited, but wouldn't the
parser distinguish  between "JSON syntax error" and "doesn't follow schema"?
I believe XML parsers make that distinction when validating against a DTD.

Also, having written a few compilers early in my career, I learned the hard
way that it was better to extend the YACC grammar to accept some common
errors, and detect them in action routines and give the user a detailed
explanation of the error. When I let YACC give a generic "syntax error",
users would bang on my door telling me the compiler was broken.

- Wendy

From:  "Y. Richard Yang" <[email protected]>
Date:  Wed, October 9, 2013 17:31
To:  Wendy Roome <[email protected]>
Cc:  IETF ALTO <[email protected]>
Subject:  Re: [alto] Is E_INVALID_FIELD_TYPE necessary?

On Wed, Oct 9, 2013 at 3:37 PM, Wendy Roome <[email protected]>
wrote:
> Do we really need a separate E_INVALID_FIELD_TYPE error code?  Why not
> just fold it into E_MISSING_FIELD, by defining that as "The field is
> missing or it has the wrong type."
> 
> First reason it's unnecessary: if the protocol says the client must
> provide a String field named "cost-type", and the client defines
> "cost-type" as an array, well, the STRING field is missing, isn't it?
> 
> Second reason: JSON libraries rarely distinguish between "missing" and
> "wrong type". Eg, getString("foo") usually returns null if "foo" doesn't
> exist or if it exists but isn't a string. The server has to do additional
> analysis to distinguish between the two cases.
> 

Suppose one uses Data Binding (e.g.,
http://wiki.fasterxml.com/JacksonInFiveMinutes) for deserialization. A wrong
type, in a strong typed language, will cause the deserialization to fail,
depending on specified features
(https://github.com/FasterXML/jackson-databind/wiki/Deserialization-Features
). Hence, if one uses a strict parser, a wrong type (e.g., a number but
should be a string) will cause the parser to fail, and the server may only
know that it is a syntax error E_SYNTAX. I agree that a missing field will
cause the same. In this sense, they are all syntax errors in a PL sense.
Hence, a more appropriate coarse error is E_SYNTAX. In this sense, E_SYNTAX
is a base class of two specific sub types (missing or wrong type). What do
you think?



_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Reply via email to