[
https://issues.apache.org/jira/browse/BEAM-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132236#comment-17132236
]
Beam JIRA Bot commented on BEAM-4132:
-------------------------------------
This issue is assigned but has not received an update in 30 days so it has been
labeled "stale-assigned". If you are still working on the issue, please give an
update and remove the label. If you are no longer working on the issue, please
unassign so someone else may work on it. In 7 days the issue will be
automatically unassigned.
> Element type inference doesn't work for multi-output DoFns
> ----------------------------------------------------------
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.4.0
> Reporter: Chuan Yu Foo
> Assignee: Udi Meiri
> Priority: P2
> Labels: stale-assigned
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with
> incorrectly have their element types set to None. This affects type checking
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
> def process(self, elem):
> yield_elem
> if elem % 2 == 0:
> yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
> yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
> def __init__(self, multiplier):
> self._multiplier = multiplier
> def process(self, elem):
> return elem * self._multiplier
>
> def main():
> with beam.Pipeline() as p:
> x, a, b = (
> p
> | 'Create' >> beam.Create([1, 2, 3])
> | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
> main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for
> 'MultiplyBy2': requires <type 'int'> but got None for elem
> {noformat}
> Replacing {{a}} with {{b}}Â yields the same error. Replacing {{a}} with {{x}}
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for
> 'MultiplyBy2': requires <type 'int'> but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element
> types of {{int}} rather than {{None}}, and I would also expect Beam to
> correctly figure out that the element types of {{x}} are compatible with
> {{int}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)