I'm open to renaming *consistentWithEquals*. If I understand the code correctly, when consistentWithEquals returns true, org.apache.beam.sdk.util.MutationDetectors expects *a.equals(deserialize(serialize(a))* which I think is reasonable for SerializableCoder (assuming objects implement equals)*. *Right now, *serialize(a).equals(serialize(deserialize(serialize(a)))* is expected and that contradicts *"does not guarantee a deterministic encoding"*.
On Mon, Nov 27, 2017 at 4:07 PM, Lukasz Cwik <[email protected]> wrote: > I think the idea is that SerializableCoder should be updated to expect > that all values it encodes do implement equals() since this seems to be the > much more common case then classes that don't implement a useful equals. It > would be possible to add a useful check to DirectRunner that any value that > says its consistent with equals actually obeys its contract. > > On Mon, Nov 27, 2017 at 4:03 PM, Eugene Kirpichov <[email protected]> > wrote: > >> Not sure where you see the contradiction? consistentWithEquals says >> "Whenever the encoded bytes of two values are equal, then the original >> values are equal according to {@code Objects.equals()}." - which is clearly >> false for Serializable's in general: it's possible that serialized form of >> "a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does >> not implement equals() or if it uses reference equality. >> >> On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov <[email protected]> >> wrote: >> >>> Hi all, >>> >>> Currently SerializableCoder#consistentWithEquals returns false, which >>> contradicts it's own documentation "{@link SerializableCoder} does not >>> guarantee a deterministic encoding, as Java serialization may produce >>> different binary encodings for two equivalent objects". >>> >>> In practice, it leads to failures in org.apache.beam.sdk.util.Mutat >>> ionDetectors$CodedValueMutationDetector. For example, some libraries >>> cache hash values https://github.com/google/protobuf/pull/3939/files# >>> diff-24c442a23a43e041e4f50d324638fedbR142 >>> >>> I'm thinking about changing SerializableCoder#consistentWithEquals to >>> return true, so that the structural value would be the object itself >>> instead of it the serialized bytes. Thoughts? >>> >>> WIP branch https://github.com/mairbek/beam/tree/ser >>> >>> Sincerely, >>> Mairbek >>> >>> -- >>> >>> Mairbek Khadikov | Software Engineer | [email protected] >>> >>> > -- Mairbek Khadikov | Software Engineer | [email protected]
