Deterministic field ordering in derived schemas

Gleb Kanterov Wed, 05 Feb 2020 08:55:29 -0800

There are Beam schema providers that use Java reflection to get fields for
classes with fields and auto-value classes. It isn't relevant for POJOs
with "creators", because function arguments are ordered. We cache instances
of schema coders, but there is no guarantee that it's deterministic between
JVMs. As a result, I've seen cases when the construction of pipeline graphs
and output schema is non-deterministic. It's especially relevant when
writing data to external storage, where row schema becomes a table schema.
There is a workaround to apply a transform that would make schema
deterministic, for instance, by ordering fields by name.


I would see a benefit in making schemas deterministic by default or at
least introducing a way to do so without writing custom code. What are your
thoughts?

Deterministic field ordering in derived schemas

Reply via email to