lostluck commented on a change in pull request #10991: [BEAM-3301] Refactor
DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#discussion_r393382209
##########
File path: sdks/go/pkg/beam/core/graph/fn.go
##########
@@ -239,52 +279,50 @@ func AsDoFn(fn *Fn) (*DoFn, error) {
return nil, addContext(err, fn)
}
- // Start validating DoFn. First, check that ProcessElement has a main
input.
+ // Validate ProcessElement has correct number of main inputs (as
indicated by
+ // numMainIn), and that main inputs are before side inputs.
processFn := fn.methods[processElementName]
- pos, num, ok := processFn.Inputs()
- if ok {
- first := processFn.Param[pos].Kind
- if first != funcx.FnValue {
- err := errors.New("side input parameters must follow
main input parameter")
- err = errors.SetTopLevelMsgf(err,
- "Method %v of DoFns should always have a main
input before side inputs, "+
- "but it has side inputs (as Iters or
ReIters) first in DoFn %v.",
- processElementName, fn.Name())
- err = errors.WithContextf(err, "method %v",
processElementName)
- return nil, addContext(err, fn)
- }
+ if err := validateMainInputs(fn, processFn, processElementName,
numMainIn); err != nil {
+ return nil, addContext(err, fn)
+ }
+
+ // If numMainIn is unknown, we can try inferring it from the second
input in ProcessElement.
+ // If there is none, or it's not a FnValue type, then we can safely
infer that there's only
+ // one main input.
+ pos, num, _ := processFn.Inputs()
+ if numMainIn == MainUnknown && (num == 1 || processFn.Param[pos+1].Kind
!= funcx.FnValue) {
+ numMainIn = MainSingle
}
// If the ProcessElement function includes side inputs or emit
functions those must also be
Review comment:
At most relaxed we'd be able to either not require them at all if none are
used, or isolate them by their types. All instances of a given side input or
emit with the same type would need to be listed at once, since otherwise we
have no way to distinguish them except by position. Permitting Nothing to be
set would be the most convenient, or permitting only the Side Inputs and not
requireing the Emits.
For now though, it's better to be more strict now and relax later, since the
inverse is impossible, and such variety is harder to maintain if unnecessary.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services