Robert Burke created BEAM-14153:
-----------------------------------

             Summary: Reshuffled Row Coder PCollection used direct to Side 
Input breaks Dataflow & PyPortable
                 Key: BEAM-14153
                 URL: https://issues.apache.org/jira/browse/BEAM-14153
             Project: Beam
          Issue Type: Bug
          Components: sdk-go
            Reporter: Robert Burke


Since First class Iterable side inputs were implemented, passing a reshuffled 
PCollection directly to a Side Input will cause a coder mismatch between 
encoding the reshuffle and decoding it on Dataflow and on Python Portable. In 
particular, the Row values will be encoded without a Length Prefix, but then be 
requested to decode them with a length prefix, which wasn't included.

This is similar to the issue in BEAM-12438 which has been hacked around. 

In this instance it's likely more resilient to always length prefix Row encoded 
types, and make it explicit in the pipeline proto. This should avoid issues 
with runners having odd behaviors WRT row coders at this time, while not 
preventing them from introspecting row encoded values should they chose.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to