I hadn't realized JSON parsers have gotten that sophisticated -- I've just used the standard DOM and SAX parsers.
That said, I don't like the idea of returning E_SYNTAX when a field has the wrong type (or is missing). That strikes me as very confusing to a client programmer. As currently defined, E_SYNTAX doesn't say where the error occurred. Odds are that the client programmer will use a generic JSON parser to verify the JSON she sent, and that parser will say the JSON is perfectly valid. And then programmer will go, "What the .?" So I'd reserve E_SYNTAX for pure JSON syntax errors. Use E_MISSING_FIELD or E_INVALID_FIELD_TYPE for schema violations. I didn't go into the details of the JSON parsers you cited, but wouldn't the parser distinguish between "JSON syntax error" and "doesn't follow schema"? I believe XML parsers make that distinction when validating against a DTD. Also, having written a few compilers early in my career, I learned the hard way that it was better to extend the YACC grammar to accept some common errors, and detect them in action routines and give the user a detailed explanation of the error. When I let YACC give a generic "syntax error", users would bang on my door telling me the compiler was broken. - Wendy From: "Y. Richard Yang" <[email protected]> Date: Wed, October 9, 2013 17:31 To: Wendy Roome <[email protected]> Cc: IETF ALTO <[email protected]> Subject: Re: [alto] Is E_INVALID_FIELD_TYPE necessary? On Wed, Oct 9, 2013 at 3:37 PM, Wendy Roome <[email protected]> wrote: > Do we really need a separate E_INVALID_FIELD_TYPE error code? Why not > just fold it into E_MISSING_FIELD, by defining that as "The field is > missing or it has the wrong type." > > First reason it's unnecessary: if the protocol says the client must > provide a String field named "cost-type", and the client defines > "cost-type" as an array, well, the STRING field is missing, isn't it? > > Second reason: JSON libraries rarely distinguish between "missing" and > "wrong type". Eg, getString("foo") usually returns null if "foo" doesn't > exist or if it exists but isn't a string. The server has to do additional > analysis to distinguish between the two cases. > Suppose one uses Data Binding (e.g., http://wiki.fasterxml.com/JacksonInFiveMinutes) for deserialization. A wrong type, in a strong typed language, will cause the deserialization to fail, depending on specified features (https://github.com/FasterXML/jackson-databind/wiki/Deserialization-Features ). Hence, if one uses a strict parser, a wrong type (e.g., a number but should be a string) will cause the parser to fail, and the server may only know that it is a syntax error E_SYNTAX. I agree that a missing field will cause the same. In this sense, they are all syntax errors in a PL sense. Hence, a more appropriate coarse error is E_SYNTAX. In this sense, E_SYNTAX is a base class of two specific sub types (missing or wrong type). What do you think?
_______________________________________________ alto mailing list [email protected] https://www.ietf.org/mailman/listinfo/alto
