[
https://issues.apache.org/jira/browse/BEAM-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Burke updated BEAM-11099:
--------------------------------
Description:
An idea borrowed from python: Allow users to specify a way to pre-process side
input data on first use, and leverage the caching. This can simplify user DoFns
by allowing them to convert their side input data (mostly lists) into a more
useful form for their access pattern.
It is strongly recommended to add Map Side Inputs
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this
suggestion, and required to have caching implemented
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit
is acheived.
See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need
to be changed.
In particular, it would require a mechanism for the SDK to determine that a
given unknown type is actually representing a side input, and a method by which
to pre-process the data associated with it.
Positional handling would expect to be maintained to identify the type of side
inputs for pipeline type checking.
Some "magic Method" similar to how the structural DoFn methods is likely the
right approach, however, it's an open question on how to make this scale
properly to more than a single side input. Otherwise, perhaps something that
takes in a valid side input form, and returns a single value to be used instead?
was:
An idea borrowed from python: Allow users to specify a way to pre-process side
input data on first use, and leverage the caching. This can simplify user DoFns
by allowing them to convert their side input data (mostly lists) into a more
useful form for their access pattern.
It is strongly recommended to add Map Side Inputs
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this
suggestion, and required to have caching implemented
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit
is acheived.
See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need
to be changed.
In particular, it would require a mechanism for the SDK to determine that a
given unknown type is actually representing a side input, and a method by which
to pre-process the data associated with it.
Positional handling would expect to be maintained to identify the type of side
inputs for pipeline type checking.
Some "magic Method" similar to how the structural DoFn methods is likely the
right approach, however, it's an open question on how to make this scale
properly to more than a single side input. Otherwise, perhaps something that
takes in a valid side input form, and returns a single value to be used instead?
Due to
> Go SDK Custom - Pre-Processing of SideInput data.
> -------------------------------------------------
>
> Key: BEAM-11099
> URL: https://issues.apache.org/jira/browse/BEAM-11099
> Project: Beam
> Issue Type: Wish
> Components: sdk-go
> Reporter: Robert Burke
> Priority: P4
>
> An idea borrowed from python: Allow users to specify a way to pre-process
> side input data on first use, and leverage the caching. This can simplify
> user DoFns by allowing them to convert their side input data (mostly lists)
> into a more useful form for their access pattern.
> It is strongly recommended to add Map Side Inputs
> https://issues.apache.org/jira/browse/BEAM-3293 before implementing this
> suggestion, and required to have caching implemented
> https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little
> benefit is acheived.
> See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need
> to be changed.
> In particular, it would require a mechanism for the SDK to determine that a
> given unknown type is actually representing a side input, and a method by
> which to pre-process the data associated with it.
> Positional handling would expect to be maintained to identify the type of
> side inputs for pipeline type checking.
> Some "magic Method" similar to how the structural DoFn methods is likely the
> right approach, however, it's an open question on how to make this scale
> properly to more than a single side input. Otherwise, perhaps something that
> takes in a valid side input form, and returns a single value to be used
> instead?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)