> On 6 Apr 2022, at 17:26, Reuven Lax <[email protected]> wrote:
>
>> Check out SchemaTestUtils.equivalentTo. It should allow you to test that two
>> rows are equivalent (i.e. have the same fields, but possibly in a different
>> order).
>
> Thanks, I already did that for another test - AvroSchemaTest.testPojoSchema()
> - where we compare the schema [1], not rows.
>
> Though, I’m not sure this is a right workaround since if the goal of this
> test is to check that we have the SAME Beam schema that is created from
> AvroPojo and default Beam Pojo schema then it’s not correct because, as you
> said above, from the Beam perspective they will be considered as two
> different schemas because of different fields order.
>
> Why is that the goal of the test?
Well, I’m just guessing because of the name and the initial code (leverage
Schema.equals() to compare the schemas) of this test. Am I mistaken here?
> The same issue for AvroSchemaTest.testPojoRecordToRow() test, where we
> compare rows, and it fails since
>
> class Row {
> boolean equals() {
> …
> if (!Objects.equals(getSchema(), other.getSchema())) {
> return false;
> }
> …
> }
> }
>
> [1]
> https://github.com/apache/beam/pull/17246/files#diff-ca874b6d378d007a590c7eb781635275623fd6d300ab1330f73c29951e7dc505R380
>
> <https://github.com/apache/beam/pull/17246/files#diff-ca874b6d378d007a590c7eb781635275623fd6d300ab1330f73c29951e7dc505R380>
>
> —
> Alexey
>
>
>
>>
>>
>> [1] https://github.com/apache/beam/pull/17246
>> <https://github.com/apache/beam/pull/17246>
>>
>>>
>>> In the same time, while generating a schema with different schema
>>> providers, the order of fields can be non-deterministic for some cases.
>>>
>>> For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” says
>>> [3] that:
>>> - “schemaFor is non deterministic - it might return fields in an arbitrary
>>> order. The reason why is that Java reflection does not guarantee the order
>>> in which it returns fields and methods, and these schemas are often based
>>> on reflective analysis of classes. “
>>>
>>> So, iiuc, it means that potentially we can have the "same" schema but with
>>> different fields order for the same, for example, POJO class but generated
>>> on different JVMs.
>>>
>>> Correct, and see above.
>>>
>>>
>>> And actually the questions:
>>> - Two Rows with the same field values but with two schemas of different
>>> fields order should be considered as two different rows or not?
>>> - This behaviour explained above - is this that was expected by initial
>>> schema design?
>>> - If fields order is so important then why?
>>>
>>> PS: My question is actually related to "AvroRecordSchema().toRowFunction()”
>>> but I guess other SchemaProvider’s also can be affected.
>>>
>>>
>>> —
>>> Alexey
>>>
>>> [1]
>>> https://beam.apache.org/documentation/programming-guide/#schema-definition
>>> <https://beam.apache.org/documentation/programming-guide/#schema-definition>
>>> [2]
>>> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303
>>>
>>> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303>
>>> [3]
>>> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91
>>>
>>> <https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91>
>