The JSON Number BNF allows longer numbers, but RFC8259 Section 6 gives everyone a free pass to back JSON numbers in implementations with IEEE754 doubles. "This specification allows" is basically a normative limit. That means that such implementations are not wrong, but they are simply doing what the spec explicitly lets them do.
" This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision." You are correct that the entire mapping of long is problematic, but for nano-timestamps, it goes as far as every _value_ around the present day being outside the spec limit. You are also correct that the spec mandates for the decoding party to be in possession of the schema. That requirement severely limits the usefulness of the JSON encoding and is actively causing problems since developers approach JSON features with the expectation that the output is interoperable and can be used wherever JSON can be handled. It is my assessment that Avro Schema is very well suited as a general-purpose schema language for defining data structures. My Avrotize tool (https://github.com/clemensv/avrotize/) proves that Avro Schema's structure and extensibility is a great "middle ground" for conversions between all sorts of different schema models, with the extra benefit of the schemas being usable with the Avro serialization framework. At Microsoft, we are using Avro schema with a handful of annotation extensions (see https://github.com/clemensv/avrotize/blob/master/specs/avrotize-schema.md) as the canonical schema model inside of Microsoft Fabric's data streaming features since we can't tool for a dozen different schema formats and the popular JSON Schema is absolutely awful to write tooling around. It is also my assessment that the JSON encoding defined for the Avro serialization framework is unusable for interoperability scenarios, not only related to the issue at hand. If you give any developer who has ever written a JSON document an Avro schema to look at and ask them to craft a JSON document that conforms with that schema, they will create a document that any other developer who looks at document and schema will nod at and say "looks right". That document will yet be vastly different from the structure the Avro spec asks for. We've done this exercise with quite a few folks inside the company, but just to underline that point, I just asked ChatGPT (o1 model) as one of those "developers": "create a JSON document conformant with this schema" { "type": "record", "namespace": "com.example.recipes", "name": "Recipe", "doc": "Avro schema for describing a cooking recipe.", "fields": [ { "name": "name", "type": "string", "doc": "Name of the recipe." }, { "name": "ingredients", "type": { "type": "array", "items": { "type": "record", "name": "Ingredient", "doc": "Describes an ingredient and its quantity.", "fields": [ { "name": "item", "type": "string", "doc": "Ingredient name." }, { "name": "quantity", "type": "string", "doc": "Amount of the ingredient." } ] } }, "doc": "List of ingredients." }, { "name": "instructions", "type": { "type": "array", "items": "string" }, "doc": "Cooking steps." }, { "name": "servings", "type": "int", "doc": "Number of servings produced." }, { "name": "prepTimeMinutes", "type": "int", "doc": "Minutes of preparation time." }, { "name": "cookTimeMinutes", "type": "int", "doc": "Minutes of cooking time." } ] } The answer is unsurprisingly miles away from how the Avro spec wants it: { "name": "Chocolate Cake", "ingredients": [ { "item": "Flour", "quantity": "2 cups" }, { "item": "Sugar", "quantity": "1.5 cups" }, { "item": "Cocoa Powder", "quantity": "3/4 cup" }, { "item": "Eggs", "quantity": "2" } ], "instructions": [ "Preheat the oven to 350°F", "Grease a round cake pan", "Combine dry ingredients and mix well", "Add eggs and stir to form a batter", "Pour batter into the pan and bake for 30 minutes" ], "servings": 8, "prepTimeMinutes": 15, "cookTimeMinutes": 30 } Now, for fun, I also asked it "create a JSON document conformant with this schema per the rules of the Avro JSON encoding" and it came up with literally the same document. -----Original Message----- From: glywk <glywk.cont...@gmail.com> Sent: Thursday, January 9, 2025 6:40 AM To: dev@avro.apache.org Subject: Re: Add support of time logical type with nanoseconds precision [Sie erhalten nicht häufig E-Mails von glywk.cont...@gmail.com. Weitere Informationen, warum dies wichtig ist, finden Sie unter https://aka.ms/LearnAboutSenderIdentification ] About your timestamp remarks, the current Avro JSON encoding specification makes JSON: - not deserializable without the schema as described in "JSON Encoding"[1] part. *"Note that the original schema is still required to correctly process JSON-encoded data."* - not easily human readable partially due to logical type serialisation[1] *"A logical type is always serialized using its underlying Avro type so that values are encoded in exactly the same way as the equivalent Avro type that does not have a logicalType attribute."* So, the interoperability problem you mentioned is not about timestamp but all fields based on long type because they are stored in memory on 64-bits signed integer. As I interpret the RFC 8259 Section 6 [2] BNF grammar, number limit and precision are not limited. So Avro long type does not break the RFC. But as suggested some implementation limited to IEEE 754 ranges to express integers may be wrong. [1] https://avro.apache.org/docs/1.12.0/specification <https://avro.apache.org/docs/1.12.0/specification/#aliases> [2] https://www.rfc-editor.org/rfc/rfc8259#section-6 Regards Le mer. 8 janv. 2025 à 14:54, Clemens Vasters <cleme...@microsoft.com.invalid> a écrit : > I agree with you proposal staying within the range. However, you > propose this to align with the nanosecond timestamps and those are broken for > JSON. > > Your proposal called that to my attention. > > Clemens > > -----Original Message----- > From: glywk <glywk.cont...@gmail.com> > Sent: Wednesday, January 8, 2025 12:15 AM > To: dev@avro.apache.org > Subject: Re: Add support of time logical type with nanoseconds > precision > > [Sie erhalten nicht häufig E-Mails von glywk.cont...@gmail.com. > Weitere Informationen, warum dies wichtig ist, finden Sie unter > https://aka.ms/LearnAboutSenderIdentification ] > > Hi, > > Your analysis is interesting but about timestamp. My proposal is about > adding nanoseconds support on time logical type. As described in > AVRO-4043 [1], the maximum value of time is 8.64E13. This value > doesn't exceeded the upper range value 2^53-1 recommended for common > interoperability with IEEE > 754 floating point representation. > > [1] > https://issu > es.apache.org%2Fjira%2Fbrowse%2FAVRO-4043&data=05%7C02%7Cclemensv%40mi > crosoft.com%7C5ba379718c234bf9d82608dd30701de8%7C72f988bf86f141af91ab2 > d7cd011db47%7C1%7C0%7C638719980313339875%7CUnknown%7CTWFpbGZsb3d8eyJFb > XB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=isoVHgOvtrllHkh4hcEsZXvcf%2Bz3e9ME87Zz%2F3WjAgY%3D&reserved=0 > > Regards >