chamikaramj commented on a change in pull request #12888:
URL: https://github.com/apache/beam/pull/12888#discussion_r493800318



##########
File path: sdks/python/apache_beam/io/iobase.py
##########
@@ -992,6 +1012,43 @@ def expand(self, pcoll):
           'A sink must inherit iobase.Sink, iobase.NativeSink, '
           'or be a PTransform. Received : %r' % self.sink)
 
+  def _pubsub_write_payload(self):
+    return beam_runner_api_pb2.PubSubWritePayload(
+        topic=self.sink.full_topic,
+        id_attribute=self.sink.id_label,
+        timestamp_attribute=self.sink.timestamp_attribute)
+
+  def to_runner_api_parameter(self, context):
+    # type: (PipelineContext) -> Tuple[str, Any]
+    # Importing locally to prevent circular dependencies.
+    from apache_beam.io.gcp.pubsub import _PubSubSink
+    if isinstance(self.sink, _PubSubSink):
+      payload = self._pubsub_write_payload()
+      return (common_urns.composites.PUBSUB_WRITE.urn, payload)
+    else:
+      return super(Write, self).to_runner_api_parameter(context)
+
+  @staticmethod
+  @ptransform.PTransform.register_urn(
+      common_urns.composites.PUBSUB_WRITE.urn,
+      beam_runner_api_pb2.PubSubWritePayload)
+  def from_runner_api_parameter(ptransform, payload, unused_context):
+    # type: (Any, Any, PipelineContext) -> Write
+    if ptransform.spec.urn != common_urns.composites.PUBSUB_WRITE.urn:
+      raise ValueError(
+          'Write transform cannot be constructed for the given proto %r',
+          ptransform)
+
+    # Importing locally to prevent circular dependencies.
+    from apache_beam.io.gcp.pubsub import _PubSubSink
+    sink = _PubSubSink(
+        topic=payload.topic,
+        id_label=payload.id_attribute,
+        with_attributes=True,

Review comment:
       I think the crux of the matter is that runners (both Dataflow and 
Direct) depend on pipeline->proto->pipleline transformation to preserve state. 
   All runners depend on it here: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L514
   Additional Dataflow performs a second round trip here: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L511
   
   Additionally DirectRunner refers to 'with_attributes' property here which is 
expected to be preserved in such a transformation: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/direct/direct_runner.py#L406
   
   My proposal it to remove other Dataflow specific state from the proto and 
keep `with_attributes` which is needed to preserve state of the transforms 
today. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to