Eugene Kirpichov created BEAM-2699:
--------------------------------------
Summary: AppliedPTransform is used as a key in hashmaps but
PTransform is not hashable/equality-comparable
Key: BEAM-2699
URL: https://issues.apache.org/jira/browse/BEAM-2699
Project: Beam
Issue Type: Bug
Components: runner-core
Reporter: Eugene Kirpichov
Assignee: Thomas Groh
There's plenty of occurrences in runners-core of Map or BiMap where the key is
an AppliedPTransform.
However, PTransform does not advertise that it is required to implement
equals/hashCode, and some transforms can't do it properly anyway - for example,
transforms that capture a ValueProvider which is also not
hashable/eq-comparable. I'm surprised that things aren't already very broken
because of this.
Fundamentally, I don't see why we should ever compare two PTransform's for
equality.
I looked at the code and wondered "can AppliedPTransform simply be
identity-hashable", but right now the answer is no because we can create an
AppliedPTransform for the same transform applied to the same thing multiple
times.
Fixing that appears to be not very easy, but definitely possible. Ideally
TransformHierarchy.Node would just know its AppliedPTransform, however a Node
can be constructed when there's yet no Pipeline. Suppose there's gotta be some
way to propagate a Pipeline into Node.finishSpecifying() (which should be
called exactly once on the Node, and this should be enforced), and have
finishSpecifying() return the AppliedPTransform, and have the caller use that
instead of potentially repeatedly calling .toAppliedPTransform() on the same
Node.
[~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?
CC: [~reuvenlax]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)