[GitHub] [beam] robertwb commented on a diff in pull request #25200: Add sideinputs to the RunInference Transform

via GitHub Wed, 01 Feb 2023 09:21:03 -0800


robertwb commented on code in PR #25200:
URL: https://github.com/apache/beam/pull/25200#discussion_r1093520635



##########
sdks/python/apache_beam/pipeline.py:
##########
@@ -525,6 +525,31 @@ def run(self, test_runner_api='AUTO'):
     self.contains_external_transforms = (
         ExternalTransformFinder.contains_external_transforms(self))
 
+    # Finds if RunInference has side inputs enables.
+    # also, checks for the side input window is global and has non default
+    # triggers.
+    run_inference_visitor = RunInferenceVisitor().visit_run_inference(self)
+    self._run_inference_contains_side_input = (
+        run_inference_visitor.contains_run_inference_side_inputs)
+
+    self.run_inference_global_window_non_default_trigger = (
+        run_inference_visitor.contains_global_windows_non_default_trigger)
+
+    if (self._run_inference_contains_side_input and
+        not self._options.view_as(StandardOptions).streaming):
+      raise RuntimeError(
+          "SideInputs to RunInference PTransform is only supported "

Review Comment:
   > > If you do have a streaming source, the pipeline should automatically run 
in streaming mode.
   > 
   > AFAIK, this isn't actually true in Python today.
   
   If that's the case, we should definitely fix this :). 
   
   > > if just because a mode is more useful in one mode than the other doesn't 
mean we should prohibit it in the other unless it's actually detrimental.
   > 
   > I think its detrimental in the batch use case because understanding the 
behavior requires understanding runner internals (in this case, how 
Dataflow/other runners handle stages), and the behavior is very confusing if 
you don't understand those internals. This is antithetical to one of the 
purposes of RunInference: taking a hard beam thing and making it easy.
   
   Could you clarify a bit more why understanding runner internals is required 
for understanding what happens in the batch case (or, in other words, what 
confusing behavior the user would run into)? I'm not proposing we do away with 
the "easy" mode when the model is known at compile time (for batch or 
streaming), rather that we allow its computation to be deferred in both modes 
if this is explicitly what the user asks for. 
   
   > I want RunInference users to be able to think about Beam as little as 
possible when building their pipelines, even if it comes at the cost of some of 
the purity of the unified model.
   
   Yes, for sure. 
   
   > One possible reconciliation here would be to emit a warning in batch mode 
with an informative error message, or allow users to intentionally enable 
running in batch mode with a parameter (though neither of those seem like 
awesome experiences, and I'd still personally favor explicitly disallowing 
this).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] robertwb commented on a diff in pull request #25200: Add sideinputs to the RunInference Transform

Reply via email to