[
https://issues.apache.org/jira/browse/BEAM-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chad Dombrova updated BEAM-8088:
--------------------------------
Summary: [python] PCollection boundedness should be tracked and propagated
(was: PCollection boundedness should be tracked and propagated)
> [python] PCollection boundedness should be tracked and propagated
> -----------------------------------------------------------------
>
> Key: BEAM-8088
> URL: https://issues.apache.org/jira/browse/BEAM-8088
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Chad Dombrova
> Assignee: Chad Dombrova
> Priority: Major
> Fix For: 2.16.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> As far as I can tell Python does not care about boundedness of PCollections
> even in streaming mode, but external transforms _do_. In my ongoing effort
> to get PubsubIO external transforms working I discovered that I could not
> generate an unbounded write.
> My pipeline looks like this:
> {code:python}
> (
> pipe
> | 'PubSubInflow' >>
> external.pubsub.ReadFromPubSub(subscription=subscription,
> with_attributes=True)
> | 'PubSubOutflow' >>
> external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True)
> )
> {code}
> The PCollections returned from the external Read are Unbounded, as expected,
> but python is responsible for creating the intermediate PCollection, which is
> always Bounded, and thus external Write is always Bounded.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)