gemini-code-assist[bot] commented on code in PR #35874:
URL: https://github.com/apache/beam/pull/35874#discussion_r2279550491


##########
sdks/python/apache_beam/typehints/typecheck_test.py:
##########
@@ -179,6 +179,15 @@ def process(self, element, *args, **kwargs):
       (self.p | beam.Create(['1', '1']) | beam.ParDo(ToInt()))
       self.p.run().wait_until_finish()
 
+  def test_bad_flatten_input(self):
+    with self.assertRaisesRegex(
+        TypeError,
+        "Inputs to Flatten cannot include an iterable of PCollections. "):
+      with beam.Pipeline() as p:
+        pc = p | beam.Create([1, 1])
+        flatten_inputs = [pc, (pc, )]
+        flatten_inputs | beam.Flatten()

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   This is a good test case. To make the validation more comprehensive, 
consider adding another test case for an iterable that contains a mix of types, 
including a `PCollection`, but where the `PCollection` is not the first 
element. The current implementation would miss this, but an improved validation 
logic (as suggested in my other comment) would catch it.
   
   For example:
   ```python
     def test_bad_flatten_input_mixed_iterable(self):
       with self.assertRaisesRegex(
           TypeError,
           "Inputs to Flatten cannot include an iterable of PCollections. "):
         with beam.Pipeline() as p:
           pc = p | beam.Create([1, 1])
           flatten_inputs = [pc, (1, pc)]
           flatten_inputs | beam.Flatten()
   ```
   This would ensure the validation is robust against more subtle user errors.



##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -3849,6 +3849,15 @@ def _extract_input_pvalues(self, pvalueish):
       raise ValueError(
           'Input to Flatten must be an iterable. '
           'Got a value of type %s instead.' % type(pvalueish))
+
+    # Spot check to see if any of the items are iterables of PCollections
+    # and raise an error if so. This is always a user-error
+    for idx, item in enumerate(pvalueish):
+      if (isinstance(item, typing.Sequence) and len(item) and
+          isinstance(item[0], pvalue.PCollection)):
+        raise TypeError(
+            'Inputs to Flatten cannot include an iterable of PCollections. '
+            f'(input at index {idx}: "{item}")')

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The current check for nested PCollections is a good start, but it's not 
fully robust. It only checks if the first element of a sequence is a 
`PCollection`, which can miss cases like `[1, pcoll]`. Also, `typing.Sequence` 
is a broad ABC that can include types like `str`, which is probably not 
intended.
   
   To make this validation more robust and specific, I suggest checking for 
`list` or `tuple` instances and then verifying if *any* of their elements are 
`PCollection`s. This will correctly handle more edge cases.
   
   ```suggestion
         if isinstance(item, (list, tuple)) and any(
             isinstance(sub_item, pvalue.PCollection) for sub_item in item):
           raise TypeError(
               'Inputs to Flatten cannot include an iterable of PCollections. '
               f'(input at index {idx}: "{item}")')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to