Thanks for advice, Gaurav, but I don’t think it will work in my case since in that unit test (AvroSchemaTest.testPojoRecordToRow() [1]) Beam schema is inferred from Avro schema which was created from Pojo in its turn. So I guess this annotation will be lost during Pojo -> Avro operation.
[1] https://github.com/apache/beam/blob/9317462f73cd3aeb42145ba41ba3b1ef0f72674b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AvroSchemaTest.java#L448 > On 5 Apr 2022, at 22:26, gaurav mishra <[email protected]> wrote: > > There was an annotation introduced in 2.37 to make sure we get the same order > of fields in schema inferred from a POJO. > https://javadoc.io/doc/org.apache.beam/beam-sdks-java-core/latest/org/apache/beam/sdk/schemas/annotations/SchemaFieldNumber.html > > <https://javadoc.io/doc/org.apache.beam/beam-sdks-java-core/latest/org/apache/beam/sdk/schemas/annotations/SchemaFieldNumber.html> > > with that annotation schemaRegistry.getSchema(dataClass) should give you > schema with the same field order. > > > On Wed, Apr 6, 2022 at 1:35 AM Alexey Romanenko <[email protected] > <mailto:[email protected]>> wrote: > Thanks for answers, Reuven. Please see the additional questions inline. > >> On 5 Apr 2022, at 20:07, Reuven Lax <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Tue, Apr 5, 2022 at 9:55 AM Alexey Romanenko <[email protected] >> <mailto:[email protected]>> wrote: >> >> So, the different fields order matters. >> >> Additionally, since "Schema.equals()” is used in "Row.equals()”, then it >> means that two Rows with different-ordered schemas but the same values will >> be considered as different rows. Is it correct? >> >> Yes, but there are ways of dealing with this: > > But what is a point of this? Why the fields order can be important, under > which circumstances? > >> 1. If using Dataflow, the pipeline update feature allows you to update to a >> compatible schema (i.e. one in which the fields have the same names but a >> different order) >> 2.You can use the Convert transform to convert rows to a compatible schema >> with a different order. > > Well, for now it’s mostly related to unit tests (e.g. > AvroSchemaTest.testPojoRecordToRow()) when we compare a manually created row > with another row that is created from a POJO with AvroRecordSchema. I’m > playing with an Avro version upgrade [1] and it fails because there are some > changes in Avro and it creates an Avro schema with a different order of > fields. So, actually I’m thinking what we can do here with that. > > [1] https://github.com/apache/beam/pull/17246 > <https://github.com/apache/beam/pull/17246> > >> >> In the same time, while generating a schema with different schema providers, >> the order of fields can be non-deterministic for some cases. >> >> For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” says >> [3] that: >> - “schemaFor is non deterministic - it might return fields in an arbitrary >> order. The reason why is that Java reflection does not guarantee the order >> in which it returns fields and methods, and these schemas are often based on >> reflective analysis of classes. “ >> >> So, iiuc, it means that potentially we can have the "same" schema but with >> different fields order for the same, for example, POJO class but generated >> on different JVMs. >> >> Correct, and see above. >> >> >> And actually the questions: >> - Two Rows with the same field values but with two schemas of different >> fields order should be considered as two different rows or not? >> - This behaviour explained above - is this that was expected by initial >> schema design? >> - If fields order is so important then why? >> >> PS: My question is actually related to "AvroRecordSchema().toRowFunction()” >> but I guess other SchemaProvider’s also can be affected. >> >> >> — >> Alexey >> >> [1] >> https://beam.apache.org/documentation/programming-guide/#schema-definition >> <https://beam.apache.org/documentation/programming-guide/#schema-definition> >> [2] >> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303 >> >> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303> >> [3] >> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91 >> >> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91>
