Hi, The approach used in https://github.com/zyu-godaddy/avro-json looks ok to me, but incomplete.
Interpreting strings as ISO_8859_1 is odd (it's been more than 2 decades since UTF-8), but not a problem if it becomes configurable. The incompleteness comes from union resolution that may not match expectations: > 12. (union, j) // for any j > find the first T in union that rule (T,j) succeeds. This is actually a tricky case, for several reasons: - Which number is intended in a [long, int, double] union when parsing the number 42? The first fit is long , but int is arguably the better fit. - What if we parse a [record1, record2] union, and record1 fits by applying default values, but record2 actually contains the properties in the JSON object? In both cases, the "find the first [...] that succeeds" perfectly solves any conflict. However, it does not always match expectations; especially in combination with record types with default field values. IMHO, the best way to solve this dilemma is by disallowing any union that can cause such ambiguities in expectations. The simplest option is to disallow any union other than a union with null (i.e., a union to make a field optional). A more general approach is to disallow any union with multiple number types, with both string and bytes, or with multiple record and/or map types. Kind regards, Oscar On Mon, 26 May 2025 at 02:25, z...@godaddy.com.INVALID <z...@godaddy.com.invalid> wrote: > Avro’s Json encoding retains enough information of writer's schema such > that it is possible to decode & resolve without writer's schema; only > reader's schema is needed. > > See https://github.com/zyu-godaddy/avro-json > > I know that we do not want to encourage such a practice. Nevertheless, it > is an interesting observation that this is possible. Appreciated if people > want to double-check the logic. > > Zhong Yu > z...@godaddy.com > -- ✉️ Oscar Westra van Holthe - Kind <opw...@apache.org>🌐 https://github.com/opwvhk/