[
https://issues.apache.org/jira/browse/BEAM-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653312#comment-16653312
]
Gleb Kanterov commented on BEAM-5646:
-------------------------------------
[~kedin] do you have any thoughts, or perhaps you can mention somebody else?
> Equality is broken for Rows with BYTES field
> --------------------------------------------
>
> Key: BEAM-5646
> URL: https://issues.apache.org/jira/browse/BEAM-5646
> Project: Beam
> Issue Type: Bug
> Components: dsl-sql
> Affects Versions: 2.7.0
> Reporter: Gleb Kanterov
> Assignee: Xu Mingmin
> Priority: Major
>
> The problem is with `org.apache.beam.sdk.values.Row#equals` and `hashCode`.
> Java arrays do reference equality instead of comparing contents. Row stores
> fields of type BYTES as byte[].
> These failing tests illustrate the problem:
> {code:java}
> @Test
> public void testByteArrayEquality() {
> byte[] a0 = new byte[16];
> byte[] b0 = new byte[16];
> Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
> Row a = Row.withSchema(schema).addValue(a0).build();
> Row b = Row.withSchema(schema).addValue(b0).build();
> Assert.assertEquals(a, b);
> }
> @Test
> public void testByteBufferEquality() {
> byte[] a0 = new byte[16];
> byte[] b0 = new byte[16];
> Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
> Row a = Row.withSchema(schema).addValue(ByteBuffer.wrap(a0)).build();
> Row b = Row.withSchema(schema).addValue(ByteBuffer.wrap(b0)).build();
> Assert.assertEquals(a, b);
> }
> {code}
>
> Option 1. Fix by storing `byte[]` as `ByteBuffer`, or something more simple
> that doesn't have offsets. `Row#getValue` will return this type, and for
> consistency, it would be preferable to change `Row#getBytes` in an
> incompatible way to be consistent with `Row#getValue` because that's how it
> behaves for the rest of the methods.
>
> Option 2. Do the same as Spark does, add `if (x instanceof byte[])` to
> `equals`. The problem in Spark is that `hashCode` implementation isn't
> consistent with `equals`, see SPARK-25122.
>
> Option 3. Consider it as intended behavior, and fix
> `RowCoder#consistentWithEquals` implementation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)