With adding BYTES type, we broke equality. `RowCoder#consistentWithEquals`
is always true, but this property doesn't hold for exotic types such as
`Map<BYTES, ?>`, `List<BYTES>`. The root cause is `byte[]`, where `equals`
is implemented as reference equality instead of structural.

Before we jump into solution mode, let's state what we want to have:
- *API* have stable API and be able to evolve efficient and use-case
specific implementations without breaking it
- *Correctness *we can't trade off correctness, a trivial implementation
should work
- *Performance *comparing equality is a fundamental operation, and we want
to make it cheap

1. set `consistentWithEquals` if there is BYTES field
Pros: almost no pros
Cons: It would introduce a significant number of allocations when comparing
rows, so we reject this option.

2. implement custom deep equals in `Row#equals`
Pros: good performance, doesn't change API, `Row#equals` is correct
Cons: doesn't work for `Map<byte[], ?>`, unless we roll own implementation
Cons: it's possible to obtain `List<byte[]>` from `getValue()` that has
broken equality, contains, etc, unless we roll own implementation
Cons: non-trivial and requires ~200LOC to implement

3. wrapping byte[] into Java object with fixed equality (e.g.,
StructuralByteArray)
Pros: good performance and flexible to change how Java wrapper is
implemented
Pros: simple, doesn't require any specialized collections, no surprises,
`Map<byte[], ?>` and `List<byte[]>` work.
Cons: will change the return type of `Row#getValue`

I want to suggest going with option #3. However, it isn't completely clear
what wrapper we want to use, either it could be StructuralByteArray, or
ByteBuffer. ByteBuffer is more standard. However, it comes with 4
additional integer fields. StructuralByteArray doesn't have anything not
necessary. One option would be adding `Row#getByteBuffer` that would be
`ByteBuffer.wrap(getValue(i).getValues())`, specialized implementation can
override it for better performance, but `getValue(i)` must return
StructuralByteArray.

References:
- [BEAM-5866] Fix `Row#equals`, https://github.com/apache/beam/pull/6845
- [BEAM-5646] Fix quality and hashcode for bytes in Row,
https://github.com/apache/beam/pull/6765

Gleb

Reply via email to