I'm open to renaming *consistentWithEquals*.

If I understand the code correctly, when consistentWithEquals returns
true, org.apache.beam.sdk.util.MutationDetectors
expects *a.equals(deserialize(serialize(a))* which I think is reasonable
for SerializableCoder (assuming objects implement equals)*. *Right now,
*serialize(a).equals(serialize(deserialize(serialize(a)))* is expected and
that contradicts *"does not guarantee a deterministic encoding"*.

On Mon, Nov 27, 2017 at 4:07 PM, Lukasz Cwik <[email protected]> wrote:

> I think the idea is that SerializableCoder should be updated to expect
> that all values it encodes do implement equals() since this seems to be the
> much more common case then classes that don't implement a useful equals. It
> would be possible to add a useful check to DirectRunner that any value that
> says its consistent with equals actually obeys its contract.
>
> On Mon, Nov 27, 2017 at 4:03 PM, Eugene Kirpichov <[email protected]>
> wrote:
>
>> Not sure where you see the contradiction? consistentWithEquals says
>> "Whenever the encoded bytes of two values are equal, then the original
>> values are equal according to {@code Objects.equals()}." - which is clearly
>> false for Serializable's in general: it's possible that serialized form of
>> "a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does
>> not implement equals() or if it uses reference equality.
>>
>> On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> Currently SerializableCoder#consistentWithEquals returns false, which
>>> contradicts it's own documentation "{@link SerializableCoder} does not
>>> guarantee a deterministic encoding, as Java serialization may produce
>>> different binary encodings for two equivalent objects".
>>>
>>> In practice, it leads to failures in org.apache.beam.sdk.util.Mutat
>>> ionDetectors$CodedValueMutationDetector. For example, some libraries
>>> cache hash values https://github.com/google/protobuf/pull/3939/files#
>>> diff-24c442a23a43e041e4f50d324638fedbR142
>>>
>>> I'm thinking about changing SerializableCoder#consistentWithEquals to
>>> return true, so that the structural value would be the object itself
>>> instead of it the serialized bytes. Thoughts?
>>>
>>> WIP branch https://github.com/mairbek/beam/tree/ser
>>>
>>> Sincerely,
>>> Mairbek
>>>
>>> --
>>>
>>> Mairbek Khadikov |  Software Engineer |  [email protected]
>>>
>>>
>


-- 

Mairbek Khadikov |  Software Engineer |  [email protected]

Reply via email to