Abacn commented on code in PR #37855:
URL: https://github.com/apache/beam/pull/37855#discussion_r2940613160


##########
sdks/python/apache_beam/typehints/row_type_test.py:
##########
@@ -85,6 +86,94 @@ def generate(num: int):
           | 'Count Elements' >> beam.Map(self._check_key_type_and_count))
       assert_that(result, equal_to([10] * 100))
 
+  def test_group_by_key_namedtuple_union(self):
+    Tuple1 = typing.NamedTuple("Tuple1", [("id", int)])
+
+    Tuple2 = typing.NamedTuple("Tuple2", [("id", int), ("name", str)])
+
+    def generate(num: int):
+      for i in range(2):
+        yield (Tuple1(i), num)
+        yield (Tuple2(i, 'a'), num)
+
+    pipeline = TestPipeline(is_integration_test=False)
+
+    with pipeline as p:
+      result = (
+          p
+          | 'Create' >> beam.Create([i for i in range(2)])
+          | 'Generate' >> beam.ParDo(generate).with_output_types(
+              tuple[(Tuple1 | Tuple2), int])
+          | 'GBK' >> beam.GroupByKey()
+          | 'Count' >> beam.Map(lambda x: len(x[1])))
+      assert_that(result, equal_to([2] * 4))
+
+  # Union of dataclasses as type hint currently result in FastPrimitiveCoder
+  # fails at GBK
+  @unittest.skip("https://github.com/apache/beam/issues/22085";)

Review Comment:
   The more I dig into, more gaps related to typehint<->schema found. This test 
(dataclass counterpart of namedtuple test above).
   
   Currently this fails because CoderRegistry.get_coder does not handle 
UnionTypeConstraint:
   
   
https://github.com/apache/beam/blob/487696c9fd1accad92194a6be93dd07f5fab57e5/sdks/python/apache_beam/coders/typecoders.py#L154
   
   and it always falls back to FastPrimitiveCoder, which cannot encode 
non-frozen dataclass. Even if it does, it's not portable (backed by pickle)
   
   Decide to stop here for this Beam release, as this PR is sufficient to basic 
dataclass support, in a backward compatibility way, and fixed two pre-existing 
bug currently also happening for named tuples



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to