Hi everyone, I have several Beam Schema-related questions since I didn’t find an exact answer for that. Let me give some brief intro before.
The order of fields in Schema - this is what users have to pay attention on, iiuc. In other words, two schemas with the same set of fields but with a different order of them will be considered as two different schemas, right? In Beam Schema doc [1] it’s said: - “The schema for a PCollection defines elements of that PCollection as an ordered list of named fields.” Also, "Schema.equals(Object)" says [2]: - “Returns true if two Schemas have the same fields in the same order." So, the different fields order matters. Additionally, since "Schema.equals()” is used in "Row.equals()”, then it means that two Rows with different-ordered schemas but the same values will be considered as different rows. Is it correct? In the same time, while generating a schema with different schema providers, the order of fields can be non-deterministic for some cases. For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” says [3] that: - “schemaFor is non deterministic - it might return fields in an arbitrary order. The reason why is that Java reflection does not guarantee the order in which it returns fields and methods, and these schemas are often based on reflective analysis of classes. “ So, iiuc, it means that potentially we can have the "same" schema but with different fields order for the same, for example, POJO class but generated on different JVMs. And actually the questions: - Two Rows with the same field values but with two schemas of different fields order should be considered as two different rows or not? - This behaviour explained above - is this that was expected by initial schema design? - If fields order is so important then why? PS: My question is actually related to "AvroRecordSchema().toRowFunction()” but I guess other SchemaProvider’s also can be affected. — Alexey [1] https://beam.apache.org/documentation/programming-guide/#schema-definition [2] https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303 [3] https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91
