Abacn commented on code in PR #37855:
URL: https://github.com/apache/beam/pull/37855#discussion_r2940613160
##########
sdks/python/apache_beam/typehints/row_type_test.py:
##########
@@ -85,6 +86,94 @@ def generate(num: int):
| 'Count Elements' >> beam.Map(self._check_key_type_and_count))
assert_that(result, equal_to([10] * 100))
+ def test_group_by_key_namedtuple_union(self):
+ Tuple1 = typing.NamedTuple("Tuple1", [("id", int)])
+
+ Tuple2 = typing.NamedTuple("Tuple2", [("id", int), ("name", str)])
+
+ def generate(num: int):
+ for i in range(2):
+ yield (Tuple1(i), num)
+ yield (Tuple2(i, 'a'), num)
+
+ pipeline = TestPipeline(is_integration_test=False)
+
+ with pipeline as p:
+ result = (
+ p
+ | 'Create' >> beam.Create([i for i in range(2)])
+ | 'Generate' >> beam.ParDo(generate).with_output_types(
+ tuple[(Tuple1 | Tuple2), int])
+ | 'GBK' >> beam.GroupByKey()
+ | 'Count' >> beam.Map(lambda x: len(x[1])))
+ assert_that(result, equal_to([2] * 4))
+
+ # Union of dataclasses as type hint currently result in FastPrimitiveCoder
+ # fails at GBK
+ @unittest.skip("https://github.com/apache/beam/issues/22085")
Review Comment:
The more I dig into, more gaps related to typehint<->schema found. This
skipped test (dataclass counterpart of namedtuple test above) demonstrates
current failure due to CoderRegistry.get_coder does not handle
UnionTypeConstraint:
https://github.com/apache/beam/blob/487696c9fd1accad92194a6be93dd07f5fab57e5/sdks/python/apache_beam/coders/typecoders.py#L154
and it always falls back to FastPrimitiveCoder, which cannot encode
non-frozen dataclass. Even if it does, it's not portable (backed by pickle)
Decide to stop here for this Beam release, as this PR is sufficient to basic
dataclass support, in a backward compatibility way, and fixed two pre-existing
bug currently also happening for named tuples
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]