No, the list of possible parameters is not known in advance. The simplest example is that state cells, side inputs, and timers have ids so there are infinitely many possibilities. The presence of these parameters also changes how the DoFn must be executed. Basic ParDo, stateful ParDo, and splittable ParDo are essentially totally separate primitives.
By design, in the user-facing APIs we emphasize being able to "statically" analyze their annotations, so having a generic API to be able to do things dynamically is not a goal. We do offer TimerFamily and [Multi]MapState, but notably not a fully dynamic API. Just those things that are obviously useful. These are naturally less able to be validated and optimized. Having a generalized API so that you can implement a DSL on top of Beam's primitives is simply a different use case. They are essentially analogous to defining classes by code vs defining classes by invoking reflective APIs. Both are useful, but they are both more useful when separated. Kenn On Wed, Mar 23, 2022 at 6:02 AM Daniel Collins <[email protected]> wrote: > Sure- but if there is a use case that exists in the wild that is not > covered by the existing state and timers APIs, I'd argue we should add an > API to handle that use case, not add a workaround to get basically the > ProcessContext back like Kenneth suggested. Is there a way to access states > or timers in a generic way in ProcessContext right now? Perhaps I'm missing > something. > > Providing a strawman for generic state access since we already have the > spec tags accessible: > > ``` > interface StateContext { > <StateT extends State> StateT get(StateSpec<StateT> spec); > } > ``` > > -Daniel > > On Wed, Mar 23, 2022 at 1:30 AM Reuven Lax <[email protected]> wrote: > >> This is not true for state and timers today. >> >> On Tue, Mar 22, 2022 at 9:57 PM Daniel Collins <[email protected]> >> wrote: >> >>> Pushing back on that Kenneth- aren't the list of potential parameters >>> known beforehand? I.e. if I wanted to write a fully generic DoFn, I could >>> just accept all of them to my @ProcessElement method. And, I could >>> presumably just pass unconstrained generics or Object for any types I >>> couldn't get the right type for in my generic framework without issue. >>> >>> -Daniel >>> >>> On Tue, Mar 22, 2022 at 9:17 PM Kenneth Knowles <[email protected]> wrote: >>> >>>> Another aspect to what Reuven brought up is that it is quite difficult >>>> to dynamically create a DoFn now that it requires annotations or magic >>>> parameters for everything. For example if I have my own DSL and I want to >>>> compile it to a DoFn where the state / parameters of the DoFn depend on the >>>> input being compiled. My proposal for that use case is to have a variant of >>>> the ParDo primitive something like a ParDo.of(DoFnInvoker). The >>>> "parameters" bundle passed to a DoFnInvoker is essentially the same thing >>>> as ye olde ProcessContext. >>>> >>>> To be clear: I absolutely support deprecating ProcessContext. I think >>>> addressing the missing use case(s) in other ways is way better than keeping >>>> it around. >>>> >>>> Kenn >>>> >>>> On Thu, Mar 10, 2022 at 3:26 PM Luke Cwik <[email protected]> wrote: >>>> >>>>> Another advantage is that once you know the parameters you can replace >>>>> the execution time wrappers with lighter weight wrappers that don't have >>>>> to >>>>> create and manage many expensive objects (e.g. replace FnApiDoFnRunner >>>>> with >>>>> MapFnRunner). >>>>> >>>>> On Thu, Mar 10, 2022 at 1:05 PM Robert Bradshaw <[email protected]> >>>>> wrote: >>>>> >>>>>> On Thu, Mar 10, 2022 at 11:45 AM Andrew Pilloud <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks for noticing that. It does look like we are missing this use >>>>>>> case for SideInputs. We can add a MultiSideInput interface that allows >>>>>>> fetching arbitrary side inputs, I opened >>>>>>> https://issues.apache.org/jira/browse/BEAM-14085 >>>>>>> >>>>>> >>>>>> +1 >>>>>> >>>>>> We can, of course, start de-emphasizing it in our docs, etc. for >>>>>> standard uscases sooner rather than later. >>>>>> >>>>>> >>>>>>> On Wed, Mar 9, 2022 at 4:25 PM Reuven Lax <[email protected]> wrote: >>>>>>> >>>>>>>> One important point - don't forget about systems that use Beam to >>>>>>>> implement other frameworks (e.g. scio, SQL, etc.). These use cases >>>>>>>> often >>>>>>>> don't statically know the graph a priori, so can't just rely on static >>>>>>>> parameters. e.g. your point about side inputs being solved only works >>>>>>>> for >>>>>>>> statically known side inputs - a framework would want to be able to >>>>>>>> specify >>>>>>>> the which side input to read at run time. >>>>>>>> >>>>>>>> On Wed, Mar 9, 2022 at 4:00 PM Andrew Pilloud <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I agree that we are missing pieces needed to depreciate >>>>>>>>> FinishBundleContext, however that interface is much smaller >>>>>>>>> (PipelineOptions plus an extended MultiOutputReceiver) so doesn't >>>>>>>>> suffer >>>>>>>>> from the problems ProcessContext does! >>>>>>>>> >>>>>>>>> Andrew >>>>>>>>> >>>>>>>>> On Wed, Mar 9, 2022 at 3:28 PM Robert Bradshaw < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> +1 to deprecating it and removing it from our examples, >>>>>>>>>> tutorials, etc. >>>>>>>>>> >>>>>>>>>> The one thing it could be useful for is forwarding all >>>>>>>>>> parameters, e.g. a wrapper converting a DoFn<V, O> to a DoFn<KV<K, >>>>>>>>>> V>>, >>>>>>>>>> KV<K, O>>, but that's already pretty cumbersome to do. >>>>>>>>>> >>>>>>>>>> On Wed, Mar 9, 2022 at 2:56 PM Andrew Pilloud < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I'm back from baby leave so I thought I'd propose a >>>>>>>>>>> big, controversial change that I've been holding back on: we should >>>>>>>>>>> mark >>>>>>>>>>> DoFn.ProcessContext deprecated in the Java SDK. The use of >>>>>>>>>>> ProcessContext >>>>>>>>>>> prevents optimizations based on DoFn metadata as it creates an >>>>>>>>>>> implicit >>>>>>>>>>> access to both the element and timestamp even if those methods are >>>>>>>>>>> never >>>>>>>>>>> called. >>>>>>>>>>> >>>>>>>>>>> There are better paths that allow users to get at all the same >>>>>>>>>>> information available through ProcessContext without implicit >>>>>>>>>>> accesses >>>>>>>>>>> notably writing a ProcessElement with the appropriate type or >>>>>>>>>>> annotation >>>>>>>>>>> parameters: DoFn.Element (and preferably DoFn.FieldAccess), >>>>>>>>>>> DoFn.SideInput, >>>>>>>>>>> PaneInfo, DoFn.Timestamp, and DoFn.OutputReceiver. Most of these >>>>>>>>>>> were added >>>>>>>>>>> in Beam 2.5.0, but the latest (SideInput) was added in 2.16.0. That >>>>>>>>>>> is to >>>>>>>>>>> say we've supported better options for multiple years. >>>>>>>>>>> >>>>>>>>>>> This is not a proposal to remove ProcessContext (that would be a >>>>>>>>>>> pretty significant breaking change) but I would suggest that we >>>>>>>>>>> don't add >>>>>>>>>>> any new features to ProcessContext, remove uses from Beam outside >>>>>>>>>>> of tests >>>>>>>>>>> targeting ProcessContext, and remove it from tutorials. >>>>>>>>>>> >>>>>>>>>>> I opened a JIRA a few months back: >>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-13057 Is there >>>>>>>>>>> anything I'm missing, anything that would block this change, or any >>>>>>>>>>> other >>>>>>>>>>> objections? >>>>>>>>>>> >>>>>>>>>>> Andrew >>>>>>>>>>> >>>>>>>>>>
