Thanks for answers, Reuven. Please see the additional questions inline. > On 5 Apr 2022, at 20:07, Reuven Lax <[email protected]> wrote: > > On Tue, Apr 5, 2022 at 9:55 AM Alexey Romanenko <[email protected] > <mailto:[email protected]>> wrote: > > So, the different fields order matters. > > Additionally, since "Schema.equals()” is used in "Row.equals()”, then it > means that two Rows with different-ordered schemas but the same values will > be considered as different rows. Is it correct? > > Yes, but there are ways of dealing with this:
But what is a point of this? Why the fields order can be important, under which circumstances? > 1. If using Dataflow, the pipeline update feature allows you to update to a > compatible schema (i.e. one in which the fields have the same names but a > different order) > 2.You can use the Convert transform to convert rows to a compatible schema > with a different order. Well, for now it’s mostly related to unit tests (e.g. AvroSchemaTest.testPojoRecordToRow()) when we compare a manually created row with another row that is created from a POJO with AvroRecordSchema. I’m playing with an Avro version upgrade [1] and it fails because there are some changes in Avro and it creates an Avro schema with a different order of fields. So, actually I’m thinking what we can do here with that. [1] https://github.com/apache/beam/pull/17246 > > In the same time, while generating a schema with different schema providers, > the order of fields can be non-deterministic for some cases. > > For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” says > [3] that: > - “schemaFor is non deterministic - it might return fields in an arbitrary > order. The reason why is that Java reflection does not guarantee the order in > which it returns fields and methods, and these schemas are often based on > reflective analysis of classes. “ > > So, iiuc, it means that potentially we can have the "same" schema but with > different fields order for the same, for example, POJO class but generated on > different JVMs. > > Correct, and see above. > > > And actually the questions: > - Two Rows with the same field values but with two schemas of different > fields order should be considered as two different rows or not? > - This behaviour explained above - is this that was expected by initial > schema design? > - If fields order is so important then why? > > PS: My question is actually related to "AvroRecordSchema().toRowFunction()” > but I guess other SchemaProvider’s also can be affected. > > > — > Alexey > > [1] > https://beam.apache.org/documentation/programming-guide/#schema-definition > <https://beam.apache.org/documentation/programming-guide/#schema-definition> > [2] > https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303 > > <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303> > [3] > https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91 > > <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91>
