Hi,

The approach used in https://github.com/zyu-godaddy/avro-json looks ok to
me, but incomplete.

Interpreting strings as ISO_8859_1 is odd (it's been more than 2 decades
since UTF-8), but not a problem if it becomes configurable.

The incompleteness comes from union resolution that may not match
expectations:

> 12. (union, j) // for any j
>     find the first T in union that rule (T,j) succeeds.


This is actually a tricky case, for several reasons:

   - Which number is intended in a [long, int, double] union when parsing
   the number 42? The first fit is long , but int is arguably the better
   fit.
   - What if we parse a [record1, record2] union, and record1 fits by
   applying default values, but record2 actually contains the properties in
   the JSON object?

In both cases, the "find the first [...] that succeeds" perfectly solves
any conflict. However, it does not always match expectations; especially in
combination with record types with default field values.

IMHO, the best way to solve this dilemma is by disallowing any union that
can cause such ambiguities in expectations. The simplest option is to
disallow any union other than a union with null (i.e., a union to make a
field optional). A more general approach is to disallow any union with
multiple number types, with both string and bytes, or with multiple record
and/or map types.


Kind regards,
Oscar


On Mon, 26 May 2025 at 02:25, z...@godaddy.com.INVALID
<z...@godaddy.com.invalid> wrote:

> Avro’s Json encoding retains enough information of writer's schema such
> that it is possible to decode & resolve without writer's schema; only
> reader's schema is needed.
>
> See https://github.com/zyu-godaddy/avro-json
>
> I know that we do not want to encourage such a practice. Nevertheless, it
> is an interesting observation that this is possible. Appreciated if people
> want to double-check the logic.
>
> Zhong Yu
> z...@godaddy.com
>


-- 

✉️ Oscar Westra van Holthe - Kind <opw...@apache.org>🌐
https://github.com/opwvhk/

Reply via email to