[
https://issues.apache.org/jira/browse/BEAM-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Burke updated BEAM-14153:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Open)
> Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow &
> PyPortable
> ---------------------------------------------------------------------------------------
>
> Key: BEAM-14153
> URL: https://issues.apache.org/jira/browse/BEAM-14153
> Project: Beam
> Issue Type: Bug
> Components: sdk-go
> Affects Versions: 2.37.0, 2.38.0
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: P2
> Fix For: 2.39.0
>
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> Since First class Iterable side inputs were implemented, passing a reshuffled
> PCollection directly to a Side Input will cause a coder mismatch between
> encoding the reshuffle and decoding it on Dataflow and on Python Portable. In
> particular, the Row values will be encoded without a Length Prefix, but then
> be requested to decode them with a length prefix, which wasn't included.
> This is similar to the issue in BEAM-12438 which has been hacked around.
> In this instance it's likely more resilient to always length prefix Row
> encoded types, and make it explicit in the pipeline proto. This should avoid
> issues with runners having odd behaviors WRT row coders at this time, while
> not preventing them from introspecting row encoded values should they chose.
> This may also allow us to avoid the hack for BEAM-12438, though that is
> something to be verified independently.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)