Blaine Hansen created BEAM-13166:
------------------------------------
Summary: Versions after `2.28.0` fail to infer grouping decoders
after a date is selected from a data structure
Key: BEAM-13166
URL: https://issues.apache.org/jira/browse/BEAM-13166
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Affects Versions: 2.33.0, 2.32.0, 2.31.0, 2.30.0, 2.29.0
Environment: We're using python linux docker images, such as
`python:bullseye`, and building an image that installs packages from a
`requirements.txt` file with a beam requirement such as `apache-beam ~= 2.28.0`
Reporter: Blaine Hansen
Fix For: 2.28.0
The below code throws this type error on the effected versions, and merely
works as expected on 2.28.0:
`TypeError: Unable to deterministically encode '2021-11-02' of type '<class
'datetime.date'>', please provide a type hint for the input of 'GroupByKey'
[while running 'Create/Map(decode)']`
{code:python}
import typing
from datetime import date
import apache_beam as beam
from apache_beam.testing.test_pipeline import TestPipeline
with TestPipeline() as pipeline:
today = date.today()
results = (
pipeline
| beam.Create([(1, { 'd': today }), (1, { 'd': today })])
| beam.MapTuple(lambda i, d: (d['d'], i)) # <-- this step only
requires output type hints on versions after 2.28.0, and only if the date is
being "projected" from some other data structure
| beam.CombinePerKey(sum) # <-- if this aggregation is removed,
the pipeline also works without error
)
results | beam.Map(print)
{code}
This stackoverflow issue is having the same problem:
https://stackoverflow.com/questions/69409693/how-do-i-use-a-datetime-date-value-in-apache-beam-groupby
It's possible to fix the errors by registering a `DateCoder` and adding output
type hints to the projection `MapTuple` step, but since this works fine in
other situations and versions, it seems this is a bug. Our production pipelines
will need to add many of these tedious type hints in order to work properly, so
we're effectively blocked from upgrading to the newest version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)