Robert Burke created BEAM-11099:
-----------------------------------

             Summary: Go SDK Custom - Pre-Processing of SideInput data.
                 Key: BEAM-11099
                 URL: https://issues.apache.org/jira/browse/BEAM-11099
             Project: Beam
          Issue Type: Wish
          Components: sdk-go
            Reporter: Robert Burke


An idea borrowed from python: Allow users to specify a way to pre-process side 
input data on first use, and leverage the caching. This can simplify user DoFns 
by allowing them to convert their side input data (mostly lists) into a more 
useful form for their access pattern. 

It is strongly recommended to add Map Side Inputs 
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this 
suggestion, and required to have caching implemented 
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit 
is acheived.

See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need 
to be changed.

In particular, it would require a mechanism for the SDK to determine that a 
given unknown type is actually representing a side input, and a method by which 
to pre-process the data associated with it. 
Positional handling would expect to be maintained to identify the type of side 
inputs for pipeline type checking.
Some "magic Method" similar to how the structural DoFn methods is likely the 
right approach, however, it's an open question on how to make this scale 
properly to more than a single side input.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to