Thanks for advice, Gaurav, but I don’t think it will work in my case since in 
that unit test (AvroSchemaTest.testPojoRecordToRow() [1]) Beam schema is 
inferred from Avro schema which was created from Pojo in its turn. So I guess 
this annotation will be lost during Pojo -> Avro operation. 

[1] 
https://github.com/apache/beam/blob/9317462f73cd3aeb42145ba41ba3b1ef0f72674b/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AvroSchemaTest.java#L448

> On 5 Apr 2022, at 22:26, gaurav mishra <[email protected]> wrote:
> 
> There was an annotation introduced in 2.37 to make sure we get the same order 
> of fields in schema inferred from a POJO. 
> https://javadoc.io/doc/org.apache.beam/beam-sdks-java-core/latest/org/apache/beam/sdk/schemas/annotations/SchemaFieldNumber.html
>  
> <https://javadoc.io/doc/org.apache.beam/beam-sdks-java-core/latest/org/apache/beam/sdk/schemas/annotations/SchemaFieldNumber.html>
> 
> with that annotation schemaRegistry.getSchema(dataClass) should give you 
> schema with the same field order. 
> 
> 
> On Wed, Apr 6, 2022 at 1:35 AM Alexey Romanenko <[email protected] 
> <mailto:[email protected]>> wrote:
> Thanks for answers, Reuven. Please see the additional questions inline.
> 
>> On 5 Apr 2022, at 20:07, Reuven Lax <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> On Tue, Apr 5, 2022 at 9:55 AM Alexey Romanenko <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> So, the different fields order matters.
>> 
>> Additionally, since "Schema.equals()” is used in "Row.equals()”, then it 
>> means that two Rows with different-ordered schemas but the same values will 
>> be considered as different rows. Is it correct?
>> 
>> Yes, but there are ways of dealing with this:
> 
> But what is a point of this? Why the fields order can be important, under 
> which circumstances?
> 
>> 1. If using Dataflow, the pipeline update feature allows you to update to a 
>> compatible schema (i.e. one in which the fields have the same names but a 
>> different order)
>> 2.You can use the Convert transform to convert rows to a compatible schema 
>> with a different order.
> 
> Well, for now it’s mostly related to unit tests (e.g. 
> AvroSchemaTest.testPojoRecordToRow()) when we compare a manually created row 
> with another row that is created from a POJO with AvroRecordSchema. I’m 
> playing with an Avro version upgrade [1] and it fails because there are some 
> changes in Avro and it creates an Avro schema with a different order of 
> fields. So, actually I’m thinking what we can do here with that.
> 
> [1] https://github.com/apache/beam/pull/17246 
> <https://github.com/apache/beam/pull/17246>
> 
>> 
>> In the same time, while generating a schema with different schema providers, 
>> the order of fields can be non-deterministic for some cases.
>> 
>> For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” says 
>> [3] that:
>> - “schemaFor is non deterministic - it might return fields in an arbitrary 
>> order. The reason why is that Java reflection does not guarantee the order 
>> in which it returns fields and methods, and these schemas are often based on 
>> reflective analysis of classes. “
>> 
>> So, iiuc, it means that potentially we can have the "same" schema but with 
>> different fields order for the same, for example, POJO class but generated 
>> on different JVMs. 
>> 
>> Correct, and see above.
>>  
>> 
>> And actually the questions: 
>> - Two Rows with the same field values but with two schemas of different 
>> fields order should be considered as two different rows or not?
>> - This behaviour explained above - is this that was expected by initial 
>> schema design? 
>> - If fields order is so important then why?
>> 
>> PS: My question is actually related to "AvroRecordSchema().toRowFunction()” 
>> but I guess other SchemaProvider’s also can be affected.
>> 
>> 
>> —
>> Alexey
>> 
>> [1] 
>> https://beam.apache.org/documentation/programming-guide/#schema-definition 
>> <https://beam.apache.org/documentation/programming-guide/#schema-definition>
>> [2] 
>> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303
>>  
>> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303>
>> [3] 
>> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91
>>  
>> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91>

Reply via email to