[ 
https://issues.apache.org/jira/browse/BEAM-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-14153:
--------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Open)

> Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow & 
> PyPortable
> ---------------------------------------------------------------------------------------
>
>                 Key: BEAM-14153
>                 URL: https://issues.apache.org/jira/browse/BEAM-14153
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>    Affects Versions: 2.37.0, 2.38.0
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: P2
>             Fix For: 2.39.0
>
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Since First class Iterable side inputs were implemented, passing a reshuffled 
> PCollection directly to a Side Input will cause a coder mismatch between 
> encoding the reshuffle and decoding it on Dataflow and on Python Portable. In 
> particular, the Row values will be encoded without a Length Prefix, but then 
> be requested to decode them with a length prefix, which wasn't included.
> This is similar to the issue in BEAM-12438 which has been hacked around. 
> In this instance it's likely more resilient to always length prefix Row 
> encoded types, and make it explicit in the pipeline proto. This should avoid 
> issues with runners having odd behaviors WRT row coders at this time, while 
> not preventing them from introspecting row encoded values should they chose. 
> This may also allow us to avoid the hack for BEAM-12438, though that is 
> something to be verified independently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to