[
https://issues.apache.org/jira/browse/BEAM-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106003#comment-16106003
]
Eugene Kirpichov commented on BEAM-2699:
----------------------------------------
You're right, this is another bug with the way things are done currently.
APT used as a key can both fail to find the same application (if the
transform is not hashable), and fail to distinguish between two
applications of the same transform to the same input.
On Fri, Jul 28, 2017, 7:42 PM Kenneth Knowles (JIRA) <[email protected]>
> AppliedPTransform is used as a key in hashmaps but PTransform is not
> hashable/equality-comparable
> -------------------------------------------------------------------------------------------------
>
> Key: BEAM-2699
> URL: https://issues.apache.org/jira/browse/BEAM-2699
> Project: Beam
> Issue Type: Bug
> Components: runner-core
> Reporter: Eugene Kirpichov
> Assignee: Thomas Groh
>
> There's plenty of occurrences in runners-core of Map or BiMap where the key
> is an AppliedPTransform.
> However, PTransform does not advertise that it is required to implement
> equals/hashCode, and some transforms can't do it properly anyway - for
> example, transforms that capture a ValueProvider which is also not
> hashable/eq-comparable. I'm surprised that things aren't already very broken
> because of this.
> Fundamentally, I don't see why we should ever compare two PTransform's for
> equality.
> I looked at the code and wondered "can AppliedPTransform simply be
> identity-hashable", but right now the answer is no because we can create an
> AppliedPTransform for the same transform applied to the same thing multiple
> times.
> Fixing that appears to be not very easy, but definitely possible. Ideally
> TransformHierarchy.Node would just know its AppliedPTransform, however a Node
> can be constructed when there's yet no Pipeline. Suppose there's gotta be
> some way to propagate a Pipeline into Node.finishSpecifying() (which should
> be called exactly once on the Node, and this should be enforced), and have
> finishSpecifying() return the AppliedPTransform, and have the caller use that
> instead of potentially repeatedly calling .toAppliedPTransform() on the same
> Node.
> [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?
> CC: [~reuvenlax]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)