[GitHub] [beam] twang126 commented on a diff in pull request #25935: Add a source and sink option to yaml pipelines.

via GitHub Wed, 22 Mar 2023 11:18:53 -0700


twang126 commented on code in PR #25935:
URL: https://github.com/apache/beam/pull/25935#discussion_r1145237180



##########
sdks/python/apache_beam/yaml/yaml_transform.py:
##########
@@ -377,6 +378,18 @@ def pipeline_as_composite(spec):
     return dict(spec, name=None, type='composite')
 
 
+def normalize_source_sink(spec):

Review Comment:
   We handled them both as lists and did as you suggested (flatten all sources, 
write to each sink).  I don't think it was too magical, users usually expected 
that behavior anyways. We had sources and sinks as separate types and looking 
back it was a headache trying to maintain all of the different types, but that 
divergence is probably inevitable. For example, trying to parse a user 
transform is gonna be a lot different than trying to read in a specific Kafka 
settings config. So even if we don't have sources be their own type from the 
beginning, it might eventually move there anyways. 
    
   Something to point out: our strategy got restrictive wrt branching (e.g. 
applying distinct transform chains for the same source or trying to sink 
intermediate results) and this is where the "too much magic" came in to bite 
us. How were you planning to support branching? 
   
   fwiw, my recommendation would be to keep the magic out of it to start and if 
users want multiple sources/sinks at the moment, they can just add them 
directly to the transform list 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] twang126 commented on a diff in pull request #25935: Add a source and sink option to yaml pipelines.

Reply via email to