Blaine Hansen created BEAM-13166:
------------------------------------

             Summary: Versions after `2.28.0` fail to infer grouping decoders 
after a date is selected from a data structure
                 Key: BEAM-13166
                 URL: https://issues.apache.org/jira/browse/BEAM-13166
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.33.0, 2.32.0, 2.31.0, 2.30.0, 2.29.0
         Environment: We're using python linux docker images, such as 
`python:bullseye`, and building an image that installs packages from a 
`requirements.txt` file with a beam requirement such as `apache-beam ~= 2.28.0`
            Reporter: Blaine Hansen
             Fix For: 2.28.0


The below code throws this type error on the effected versions, and merely 
works as expected on 2.28.0:

`TypeError: Unable to deterministically encode '2021-11-02' of type '<class 
'datetime.date'>', please provide a type hint for the input of 'GroupByKey' 
[while running 'Create/Map(decode)']`

{code:python}
import typing
from datetime import date
import apache_beam as beam
from apache_beam.testing.test_pipeline import TestPipeline

with TestPipeline() as pipeline:
        today = date.today()
        results = (
                pipeline
                | beam.Create([(1, { 'd': today }), (1, { 'd': today })])
                | beam.MapTuple(lambda i, d: (d['d'], i)) # <-- this step only 
requires output type hints on versions after 2.28.0, and only if the date is 
being "projected" from some other data structure
                | beam.CombinePerKey(sum) # <-- if this aggregation is removed, 
the pipeline also works without error
        )

        results | beam.Map(print)
{code}

This stackoverflow issue is having the same problem:
https://stackoverflow.com/questions/69409693/how-do-i-use-a-datetime-date-value-in-apache-beam-groupby

It's possible to fix the errors by registering a `DateCoder` and adding output 
type hints to the projection `MapTuple` step, but since this works fine in 
other situations and versions, it seems this is a bug. Our production pipelines 
will need to add many of these tedious type hints in order to work properly, so 
we're effectively blocked from upgrading to the newest version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to