[
https://issues.apache.org/jira/browse/BEAM-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415798#comment-17415798
]
Brian Hulette commented on BEAM-12593:
--------------------------------------
I looked into BEAM-12764. The error is occurring because Dataflow workers are
failing to unpickle DoFns created by the DataFrame API. The DoFns include
serialized pandas dataframes, which are created with pandas 1.3.x after this
change, but Dataflow workers are on pandas 1.2.x still. My proposed solution:
- go ahead and upgrade the Dataflow worker to use pandas 1.3.x
- re-apply pr/15165 with a patch the bumps the worker container
Note that pandas tries to maintain backwards compatibility with pickled
dataframes. So having a newer version on Dataflow workers shouldn't be an issue
(for serialization. having a mismatched pandas version could still lead to
undefined behavior in DataFrame API operations).
Open question: Why are we creating DoFns with serialized dataframes?
> DataFrame API: Support pandas 1.3.x
> -----------------------------------
>
> Key: BEAM-12593
> URL: https://issues.apache.org/jira/browse/BEAM-12593
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Time Spent: 13h
> Remaining Estimate: 0h
>
> Started a WIP PR here: https://github.com/apache/beam/pull/15008 that used
> rc1. Now the official 1.3.0 is out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)