Robert Burke created BEAM-11099:
-----------------------------------
Summary: Go SDK Custom - Pre-Processing of SideInput data.
Key: BEAM-11099
URL: https://issues.apache.org/jira/browse/BEAM-11099
Project: Beam
Issue Type: Wish
Components: sdk-go
Reporter: Robert Burke
An idea borrowed from python: Allow users to specify a way to pre-process side
input data on first use, and leverage the caching. This can simplify user DoFns
by allowing them to convert their side input data (mostly lists) into a more
useful form for their access pattern.
It is strongly recommended to add Map Side Inputs
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this
suggestion, and required to have caching implemented
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit
is acheived.
See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need
to be changed.
In particular, it would require a mechanism for the SDK to determine that a
given unknown type is actually representing a side input, and a method by which
to pre-process the data associated with it.
Positional handling would expect to be maintained to identify the type of side
inputs for pipeline type checking.
Some "magic Method" similar to how the structural DoFn methods is likely the
right approach, however, it's an open question on how to make this scale
properly to more than a single side input.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)