[ 
https://issues.apache.org/jira/browse/BEAM-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-11099:
--------------------------------
    Description: 
An idea borrowed from python: Allow users to specify a way to pre-process side 
input data on first use, and leverage the caching. This can simplify user DoFns 
by allowing them to convert their side input data (mostly lists) into a more 
useful form for their access pattern. 

It is strongly recommended to add Map Side Inputs 
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this 
suggestion, and required to have caching implemented 
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit 
is acheived.

See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need 
to be changed.

In particular, it would require a mechanism for the SDK to determine that a 
given unknown type is actually representing a side input, and a method by which 
to pre-process the data associated with it. 
Positional handling would expect to be maintained to identify the type of side 
inputs for pipeline type checking.
Some "magic Method" similar to how the structural DoFn methods is likely the 
right approach, however, it's an open question on how to make this scale 
properly to more than a single side input. Otherwise, perhaps something that 
takes in a valid side input form, and returns a single value to be used instead?
 

  was:
An idea borrowed from python: Allow users to specify a way to pre-process side 
input data on first use, and leverage the caching. This can simplify user DoFns 
by allowing them to convert their side input data (mostly lists) into a more 
useful form for their access pattern. 

It is strongly recommended to add Map Side Inputs 
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this 
suggestion, and required to have caching implemented 
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit 
is acheived.

See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need 
to be changed.

In particular, it would require a mechanism for the SDK to determine that a 
given unknown type is actually representing a side input, and a method by which 
to pre-process the data associated with it. 
Positional handling would expect to be maintained to identify the type of side 
inputs for pipeline type checking.
Some "magic Method" similar to how the structural DoFn methods is likely the 
right approach, however, it's an open question on how to make this scale 
properly to more than a single side input. Otherwise, perhaps something that 
takes in a valid side input form, and returns a single value to be used instead?

Due to
 


> Go SDK Custom - Pre-Processing of SideInput data.
> -------------------------------------------------
>
>                 Key: BEAM-11099
>                 URL: https://issues.apache.org/jira/browse/BEAM-11099
>             Project: Beam
>          Issue Type: Wish
>          Components: sdk-go
>            Reporter: Robert Burke
>            Priority: P4
>
> An idea borrowed from python: Allow users to specify a way to pre-process 
> side input data on first use, and leverage the caching. This can simplify 
> user DoFns by allowing them to convert their side input data (mostly lists) 
> into a more useful form for their access pattern. 
> It is strongly recommended to add Map Side Inputs 
> https://issues.apache.org/jira/browse/BEAM-3293 before implementing this 
> suggestion, and required to have caching implemented 
> https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little 
> benefit is acheived.
> See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need 
> to be changed.
> In particular, it would require a mechanism for the SDK to determine that a 
> given unknown type is actually representing a side input, and a method by 
> which to pre-process the data associated with it. 
> Positional handling would expect to be maintained to identify the type of 
> side inputs for pipeline type checking.
> Some "magic Method" similar to how the structural DoFn methods is likely the 
> right approach, however, it's an open question on how to make this scale 
> properly to more than a single side input. Otherwise, perhaps something that 
> takes in a valid side input form, and returns a single value to be used 
> instead?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to