[
https://issues.apache.org/jira/browse/BEAM-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105917#comment-16105917
]
Eugene Kirpichov commented on BEAM-2699:
----------------------------------------
By the way, this was found when testing a PR where DirectRunner to/from proto
machinery ended up hashing a transform that captured a null ValueProvider and
used it in hashCode() as "provider.get()" - this threw an NPE, but generally
this code shouldn't have been called at all at construction time, because
ValueProvider could have been inaccessible. On the other hand, hashing the
ValueProvider itself is also incorrect (and throws errors in runners-core
instead) because ValueProvider does not implement hashCode/equals.
> AppliedPTransform is used as a key in hashmaps but PTransform is not
> hashable/equality-comparable
> -------------------------------------------------------------------------------------------------
>
> Key: BEAM-2699
> URL: https://issues.apache.org/jira/browse/BEAM-2699
> Project: Beam
> Issue Type: Bug
> Components: runner-core
> Reporter: Eugene Kirpichov
> Assignee: Thomas Groh
>
> There's plenty of occurrences in runners-core of Map or BiMap where the key
> is an AppliedPTransform.
> However, PTransform does not advertise that it is required to implement
> equals/hashCode, and some transforms can't do it properly anyway - for
> example, transforms that capture a ValueProvider which is also not
> hashable/eq-comparable. I'm surprised that things aren't already very broken
> because of this.
> Fundamentally, I don't see why we should ever compare two PTransform's for
> equality.
> I looked at the code and wondered "can AppliedPTransform simply be
> identity-hashable", but right now the answer is no because we can create an
> AppliedPTransform for the same transform applied to the same thing multiple
> times.
> Fixing that appears to be not very easy, but definitely possible. Ideally
> TransformHierarchy.Node would just know its AppliedPTransform, however a Node
> can be constructed when there's yet no Pipeline. Suppose there's gotta be
> some way to propagate a Pipeline into Node.finishSpecifying() (which should
> be called exactly once on the Node, and this should be enforced), and have
> finishSpecifying() return the AppliedPTransform, and have the caller use that
> instead of potentially repeatedly calling .toAppliedPTransform() on the same
> Node.
> [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?
> CC: [~reuvenlax]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)