[ 
https://issues.apache.org/jira/browse/BEAM-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Schinmeyer updated BEAM-13217:
------------------------------------
    Description: 
After upgrading our Python project from 2.31.0 to 2.33.0, we started getting 
TypeCheckErrors such as
{quote}apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
'all_data/combine_new_and_all': requires {{Tuple[Tuple[Any, Any], Dict[str, 
Iterable[_CombinedEntry]]]}} but got {{Tuple[Tuple[int, int], Dict[str, 
List[Union[]]]]}} for element
{quote}
where the output value of a {{CoGroupByKey()}} is apparently incorrectly 
deduced to be a {{Dict[str, List[Union[]]]}}.

I managed to build a small repro case:
{code:python}
import apache_beam as beam
from typing import Dict, Iterable, Tuple

{
    "foo": [(42, "foo")],
    "bar": [(42, "bar")],
} | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, Iterable[str]]])
{code}
which raises
{quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint 
violation at CoGroupByKey: expected {{Tuple[int, Dict[str, Iterable[str]]]}}, 
got {{Tuple[int, Dict[str, List[Union[]]]]}}
{quote}
or alternatively, using a TestPipeline:
{code:python}
import apache_beam as beam
from apache_beam.testing.test_pipeline import TestPipeline
from apache_beam.testing.util import assert_that, equal_to
from typing import Dict, Iterable, Tuple

with TestPipeline() as p:
    actual = {
        "foo": p | "create_foo" >> beam.Create([(42, "foo")]),
        "bar": p | "create_bar" >> beam.Create([(42, "bar")]),
    } | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, 
Iterable[str]]])
    assert_that(actual, equal_to([(42, {"foo": ["foo"], "bar": ["bar"]})]))
{code}
Oh, and one more thing, about that {{Tuple[Any, Any]}} from the original error 
message I posted. We can reproduce that like this:
{code:python}
import apache_beam as beam
from typing import Dict, Iterable, NewType, Tuple

key = NewType("key", int)
{
    "foo": [(key(1337), "foo")],
    "bar": [(key(1337), "bar")],
} | beam.CoGroupByKey().with_output_types(Tuple[key, Dict[str, Iterable[str]]])
{code}
{quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint 
violation at CoGroupByKey: expected {{Tuple[Any, Dict[str, Iterable[str]]]}}, 
got {{Tuple[int, Dict[str, List[Union[]]]]}}
{quote}
It looks like {{NewType}} is treated as {{Any}}? That surprised me.

I could also reproduce the issue in 2.32.0.

  was:
After upgrading our Python project from 2.31.0 to 2.33.0, we started getting 
TypeCheckErrors such as
{quote}apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
'all_data/combine_new_and_all': requires {{Tuple[Tuple[Any, Any], Dict[str, 
Iterable[_CombinedEntry]]]}} but got {{Tuple[Tuple[int, int], Dict[str, 
List[Union[]]]]}} for element
{quote}
where the output value of a {{CoGroupByKey()}} is apparently incorrectly 
deduced to be a {{Dict[str, List[Union[]]]}}.

I managed to build a small repro case:
{code:python}
import apache_beam as beam
from typing import Dict, Iterable, Tuple

{
    "foo": [(42, "foo")],
    "bar": [(42, "bar")],
} | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, Iterable[str]]])
{code}
which raises
{quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint 
violation at CoGroupByKey: expected {{Tuple[int, Dict[str, Iterable[str]]]}}, 
got {{Tuple[int, Dict[str, List[Union[]]]]}}
{quote}
or alternatively, using a TestPipeline:
{code:python}
import apache_beam as beam
from apache_beam.testing.test_pipeline import TestPipeline
from apache_beam.testing.util import assert_that, equal_to
from typing import Dict, Iterable, Tuple

with TestPipeline() as p:
    actual = {
        "foo": p | "create_foo" >> beam.Create([(42, "foo")]),
        "bar": p | "create_bar" >> beam.Create([(42, "bar")]),
    } | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, 
Iterable[str]]])
    assert_that(actual, equal_to([(42, {"foo": ["foo"], "bar": ["bar"]})]))
{code}
Oh, and one more thing, about that {{Tuple[Any, Any]}} from the original error 
message I posted. We can reproduce that like this:
{code:python}
import apache_beam as beam
from typing import Dict, Iterable, NewType, Tuple

key = NewType("key", int)
{
    "foo": [(key(1337), "foo")],
    "bar": [(key(1337), "bar")],
} | beam.CoGroupByKey().with_output_types(Tuple[key, Dict[str, Iterable[str]]])
{code}
{quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint 
violation at CoGroupByKey: expected {{Tuple[Any, Dict[str, Iterable[str]]]}}, 
got {{Tuple[int, Dict[str, List[Union[]]]]}}
{quote}
It looks like {{NewType}} is treated as {{Any}}? That surprised me.


> TypeCheckError due to CoGroupByKey output mis-deduction
> -------------------------------------------------------
>
>                 Key: BEAM-13217
>                 URL: https://issues.apache.org/jira/browse/BEAM-13217
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.32.0, 2.33.0
>            Reporter: Willi Schinmeyer
>            Priority: P2
>
> After upgrading our Python project from 2.31.0 to 2.33.0, we started getting 
> TypeCheckErrors such as
> {quote}apache_beam.typehints.decorators.TypeCheckError: Type hint violation 
> for 'all_data/combine_new_and_all': requires {{Tuple[Tuple[Any, Any], 
> Dict[str, Iterable[_CombinedEntry]]]}} but got {{Tuple[Tuple[int, int], 
> Dict[str, List[Union[]]]]}} for element
> {quote}
> where the output value of a {{CoGroupByKey()}} is apparently incorrectly 
> deduced to be a {{Dict[str, List[Union[]]]}}.
> I managed to build a small repro case:
> {code:python}
> import apache_beam as beam
> from typing import Dict, Iterable, Tuple
> {
>     "foo": [(42, "foo")],
>     "bar": [(42, "bar")],
> } | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, 
> Iterable[str]]])
> {code}
> which raises
> {quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint 
> violation at CoGroupByKey: expected {{Tuple[int, Dict[str, Iterable[str]]]}}, 
> got {{Tuple[int, Dict[str, List[Union[]]]]}}
> {quote}
> or alternatively, using a TestPipeline:
> {code:python}
> import apache_beam as beam
> from apache_beam.testing.test_pipeline import TestPipeline
> from apache_beam.testing.util import assert_that, equal_to
> from typing import Dict, Iterable, Tuple
> with TestPipeline() as p:
>     actual = {
>         "foo": p | "create_foo" >> beam.Create([(42, "foo")]),
>         "bar": p | "create_bar" >> beam.Create([(42, "bar")]),
>     } | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, 
> Iterable[str]]])
>     assert_that(actual, equal_to([(42, {"foo": ["foo"], "bar": ["bar"]})]))
> {code}
> Oh, and one more thing, about that {{Tuple[Any, Any]}} from the original 
> error message I posted. We can reproduce that like this:
> {code:python}
> import apache_beam as beam
> from typing import Dict, Iterable, NewType, Tuple
> key = NewType("key", int)
> {
>     "foo": [(key(1337), "foo")],
>     "bar": [(key(1337), "bar")],
> } | beam.CoGroupByKey().with_output_types(Tuple[key, Dict[str, 
> Iterable[str]]])
> {code}
> {quote}apache_beam.typehints.decorators.TypeCheckError: Output type hint 
> violation at CoGroupByKey: expected {{Tuple[Any, Dict[str, Iterable[str]]]}}, 
> got {{Tuple[int, Dict[str, List[Union[]]]]}}
> {quote}
> It looks like {{NewType}} is treated as {{Any}}? That surprised me.
> I could also reproduce the issue in 2.32.0.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to