[ https://issues.apache.org/jira/browse/BEAM-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948881#comment-16948881 ]
Robert Bradshaw commented on BEAM-8140: --------------------------------------- I know we made it possible to apply the same PTransform as many times as you want within a pipeline, but don't recall why it cares about (or stores a reference to) the pipeline itself. This code was intended to prohibit mixing values across pipelines (e.g. flatting a PCollection with one pipeline to a PCollection of another). This should be fixed. > Python API: PTransform should be immutable > ------------------------------------------ > > Key: BEAM-8140 > URL: https://issues.apache.org/jira/browse/BEAM-8140 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core > Reporter: Chris Suchandk > Assignee: Robert Bradshaw > Priority: Major > > While the Java API seems fine the Python API is (at least) counterintuitive. > Let's see the following example: > {code:python} > p1 = beam.Pipeline() > p2 = beam.Pipeline() > node = 'ReadTrainData' >> beam.io.ReadFromText("/tmp/aaa.txt") > p1 | node > p2 | node //fails here {code} > The code above will fail because the _node_ somehow remembers that it was > already attached to _p1_. In fact, unlike in Java, the | (apply) method is > defined on the _PTransform_. > If any, only the pipeline object should be mutable here. -- This message was sent by Atlassian Jira (v8.3.4#803005)