> On 6 Apr 2022, at 17:26, Reuven Lax <[email protected]> wrote:
> 
>> Check out SchemaTestUtils.equivalentTo. It should allow you to test that two 
>> rows are equivalent (i.e. have the same fields, but possibly in a different 
>> order).
> 
> Thanks, I already did that for another test - AvroSchemaTest.testPojoSchema() 
> - where we compare the schema [1], not rows. 
> 
> Though, I’m not sure this is a right workaround since if the goal of this 
> test is to check that we have the SAME Beam schema that is created from 
> AvroPojo and default Beam Pojo schema then it’s not correct because, as you 
> said above, from the Beam perspective they will be considered as two 
> different schemas because of different fields order. 
> 
> Why is that the goal of the test?

Well, I’m just guessing because of the name and the initial code (leverage 
Schema.equals() to compare the schemas) of this test. Am I mistaken here? 

> The same issue for AvroSchemaTest.testPojoRecordToRow() test, where we 
> compare rows, and it fails since
> 
> class Row {
>   boolean equals() { 
>     …
>     if (!Objects.equals(getSchema(), other.getSchema())) {
>       return false;
>     }
>     …
>   }
> } 
> 
> [1] 
> https://github.com/apache/beam/pull/17246/files#diff-ca874b6d378d007a590c7eb781635275623fd6d300ab1330f73c29951e7dc505R380
>  
> <https://github.com/apache/beam/pull/17246/files#diff-ca874b6d378d007a590c7eb781635275623fd6d300ab1330f73c29951e7dc505R380>
> 
> —
> Alexey
> 
> 
> 
>>  
>> 
>> [1] https://github.com/apache/beam/pull/17246 
>> <https://github.com/apache/beam/pull/17246>
>> 
>>> 
>>> In the same time, while generating a schema with different schema 
>>> providers, the order of fields can be non-deterministic for some cases.
>>> 
>>> For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” says 
>>> [3] that:
>>> - “schemaFor is non deterministic - it might return fields in an arbitrary 
>>> order. The reason why is that Java reflection does not guarantee the order 
>>> in which it returns fields and methods, and these schemas are often based 
>>> on reflective analysis of classes. “
>>> 
>>> So, iiuc, it means that potentially we can have the "same" schema but with 
>>> different fields order for the same, for example, POJO class but generated 
>>> on different JVMs. 
>>> 
>>> Correct, and see above.
>>>  
>>> 
>>> And actually the questions: 
>>> - Two Rows with the same field values but with two schemas of different 
>>> fields order should be considered as two different rows or not?
>>> - This behaviour explained above - is this that was expected by initial 
>>> schema design? 
>>> - If fields order is so important then why?
>>> 
>>> PS: My question is actually related to "AvroRecordSchema().toRowFunction()” 
>>> but I guess other SchemaProvider’s also can be affected.
>>> 
>>> 
>>> —
>>> Alexey
>>> 
>>> [1] 
>>> https://beam.apache.org/documentation/programming-guide/#schema-definition 
>>> <https://beam.apache.org/documentation/programming-guide/#schema-definition>
>>> [2] 
>>> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303
>>>  
>>> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303>
>>> [3] 
>>> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91
>>>  
>>> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91>
> 

Reply via email to