Robert Burke created BEAM-10056:
-----------------------------------

             Summary: Side Input Validation too tight, doesn't allow CoGBK
                 Key: BEAM-10056
                 URL: https://issues.apache.org/jira/browse/BEAM-10056
             Project: Beam
          Issue Type: Bug
          Components: sdk-go
            Reporter: Robert Burke
            Assignee: Robert Burke


The following doesn't pass validation, though it should as it's a valid 
signature for ParDo accepting a PCollection<CoGBK<string, *clientHistory, 
*clientHistory>>

func (fn *writer) StartBundle(ctx context.Context) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter1, iter2 func(**clientHistory) bool)

func (fn *writer) FinishBundle(ctx context.Context)

It returns an error:

Missing side inputs in the StartBundle method of a DoFn. If side inputs are 
present in ProcessElement those side inputs must also be present in StartBundle.
Full error:
        inserting ParDo in scope root:
        graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle [recovered]
        panic: Missing side inputs in the StartBundle method of a DoFn. If side 
inputs are present in ProcessElement those side inputs must also be present in 
StartBundle.
Full error:
        inserting ParDo in scope root:
        graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle


This is happening in the input unaware validation, which means it needs to be 
loosened, and validated elsewhere.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527

There are "sibling" cases for the DoFn  signature

func (fn *writer) StartBundle(context.Context, side func(**clientHistory) bool) 
error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter, side func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) 
bool)

and

func (fn *writer) StartBundle(context.Context, side1, side2 
func(**clientHistory) bool) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
side1, side2 func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side1, side2 
func(**clientHistory) bool)

Would be for  <CoGBK<string, *clientHistory>> with <*clientHistory> on the 
side, and
 <string,> with <*clientHistory> and <*clientHistory> on the side respectively.

Which would only be determinable fully with the input, and should provide a 
clear error when PCollection binding is occuring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to