[
https://issues.apache.org/jira/browse/BEAM-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512131#comment-17512131
]
Robert Burke commented on BEAM-14153:
-------------------------------------
It would have been a regression if caught during the 2.37 release, when it was
introduced, but it was a very strange situation to end up here, and it only
breaks in Dataflow (where it's unlikely to be used as a construct) and the
PyPortable runner (which isn't easy to use vs Flink or Spark, which aren't
affected.)
Bumping to be fixed for 2.39, and not block 2.38. I was just ambitious about
when I could get a fix in this week.
> Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow &
> PyPortable
> ---------------------------------------------------------------------------------------
>
> Key: BEAM-14153
> URL: https://issues.apache.org/jira/browse/BEAM-14153
> Project: Beam
> Issue Type: Bug
> Components: sdk-go
> Affects Versions: 2.37.0
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: P2
> Fix For: 2.38.0
>
>
> Since First class Iterable side inputs were implemented, passing a reshuffled
> PCollection directly to a Side Input will cause a coder mismatch between
> encoding the reshuffle and decoding it on Dataflow and on Python Portable. In
> particular, the Row values will be encoded without a Length Prefix, but then
> be requested to decode them with a length prefix, which wasn't included.
> This is similar to the issue in BEAM-12438 which has been hacked around.
> In this instance it's likely more resilient to always length prefix Row
> encoded types, and make it explicit in the pipeline proto. This should avoid
> issues with runners having odd behaviors WRT row coders at this time, while
> not preventing them from introspecting row encoded values should they chose.
> This may also allow us to avoid the hack for BEAM-12438, though that is
> something to be verified independently.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)