gemini-code-assist[bot] commented on code in PR #35874: URL: https://github.com/apache/beam/pull/35874#discussion_r2279550491
########## sdks/python/apache_beam/typehints/typecheck_test.py: ########## @@ -179,6 +179,15 @@ def process(self, element, *args, **kwargs): (self.p | beam.Create(['1', '1']) | beam.ParDo(ToInt())) self.p.run().wait_until_finish() + def test_bad_flatten_input(self): + with self.assertRaisesRegex( + TypeError, + "Inputs to Flatten cannot include an iterable of PCollections. "): + with beam.Pipeline() as p: + pc = p | beam.Create([1, 1]) + flatten_inputs = [pc, (pc, )] + flatten_inputs | beam.Flatten() Review Comment:  This is a good test case. To make the validation more comprehensive, consider adding another test case for an iterable that contains a mix of types, including a `PCollection`, but where the `PCollection` is not the first element. The current implementation would miss this, but an improved validation logic (as suggested in my other comment) would catch it. For example: ```python def test_bad_flatten_input_mixed_iterable(self): with self.assertRaisesRegex( TypeError, "Inputs to Flatten cannot include an iterable of PCollections. "): with beam.Pipeline() as p: pc = p | beam.Create([1, 1]) flatten_inputs = [pc, (1, pc)] flatten_inputs | beam.Flatten() ``` This would ensure the validation is robust against more subtle user errors. ########## sdks/python/apache_beam/transforms/core.py: ########## @@ -3849,6 +3849,15 @@ def _extract_input_pvalues(self, pvalueish): raise ValueError( 'Input to Flatten must be an iterable. ' 'Got a value of type %s instead.' % type(pvalueish)) + + # Spot check to see if any of the items are iterables of PCollections + # and raise an error if so. This is always a user-error + for idx, item in enumerate(pvalueish): + if (isinstance(item, typing.Sequence) and len(item) and + isinstance(item[0], pvalue.PCollection)): + raise TypeError( + 'Inputs to Flatten cannot include an iterable of PCollections. ' + f'(input at index {idx}: "{item}")') Review Comment:  The current check for nested PCollections is a good start, but it's not fully robust. It only checks if the first element of a sequence is a `PCollection`, which can miss cases like `[1, pcoll]`. Also, `typing.Sequence` is a broad ABC that can include types like `str`, which is probably not intended. To make this validation more robust and specific, I suggest checking for `list` or `tuple` instances and then verifying if *any* of their elements are `PCollection`s. This will correctly handle more edge cases. ```suggestion if isinstance(item, (list, tuple)) and any( isinstance(sub_item, pvalue.PCollection) for sub_item in item): raise TypeError( 'Inputs to Flatten cannot include an iterable of PCollections. ' f'(input at index {idx}: "{item}")') ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org