[ 
https://issues.apache.org/jira/browse/BEAM-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948881#comment-16948881
 ] 

Robert Bradshaw commented on BEAM-8140:
---------------------------------------

I know we made it possible to apply the same PTransform as many times as you 
want within a pipeline, but don't recall why it cares about (or stores a 
reference to) the pipeline itself. This code was intended to prohibit mixing 
values across pipelines (e.g. flatting a PCollection with one pipeline to a 
PCollection of another). This should be fixed. 

> Python API: PTransform should be immutable
> ------------------------------------------
>
>                 Key: BEAM-8140
>                 URL: https://issues.apache.org/jira/browse/BEAM-8140
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Chris Suchandk
>            Assignee: Robert Bradshaw
>            Priority: Major
>
> While the Java API seems fine the Python API is (at least) counterintuitive.
> Let's see the following example:
> {code:python}
> p1 = beam.Pipeline()
> p2 = beam.Pipeline()
> node = 'ReadTrainData' >> beam.io.ReadFromText("/tmp/aaa.txt")
> p1 | node 
> p2 | node //fails here {code}
> The code above will fail because the _node_ somehow remembers that it was 
> already attached to _p1_. In fact, unlike in Java, the | (apply) method is 
> defined on the _PTransform_.
> If any, only the pipeline object should be mutable here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to