Hi everyone,

I have several Beam Schema-related questions since I didn’t find an exact 
answer for that. Let me give some brief intro before.

The order of fields in Schema - this is what users have to pay attention on, 
iiuc. In other words, two schemas with the same set of fields but with a 
different order of them will be considered as two different schemas, right? 

In Beam Schema doc [1] it’s said:
- “The schema for a PCollection defines elements of that PCollection as an 
ordered list of named fields.”

Also, "Schema.equals(Object)" says [2]:
- “Returns true if two Schemas have the same fields in the same order."

So, the different fields order matters.

Additionally, since "Schema.equals()” is used in "Row.equals()”, then it means 
that two Rows with different-ordered schemas but the same values will be 
considered as different rows. Is it correct?

In the same time, while generating a schema with different schema providers, 
the order of fields can be non-deterministic for some cases.

For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” says [3] 
that:
- “schemaFor is non deterministic - it might return fields in an arbitrary 
order. The reason why is that Java reflection does not guarantee the order in 
which it returns fields and methods, and these schemas are often based on 
reflective analysis of classes. “

So, iiuc, it means that potentially we can have the "same" schema but with 
different fields order for the same, for example, POJO class but generated on 
different JVMs. 

And actually the questions: 
- Two Rows with the same field values but with two schemas of different fields 
order should be considered as two different rows or not?
- This behaviour explained above - is this that was expected by initial schema 
design? 
- If fields order is so important then why?

PS: My question is actually related to "AvroRecordSchema().toRowFunction()” but 
I guess other SchemaProvider’s also can be affected.


—
Alexey

[1] https://beam.apache.org/documentation/programming-guide/#schema-definition
[2] 
https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303
[3] 
https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91

Reply via email to