Yup, it makes sense, it's what I had in mind.
In Apache Camel, in a Processor (similar to a DoFn), we can also pass
directly languages to the arguments.
We can imagine something like:
@ProcessElement void process(@json-path("foo") String foo)
@ProcessElement void process(@xpath("//foo") String foo)
or even a expression language (simple/groovy/whatever).
Regards
JB
On 04/06/2018 16:39, Reuven Lax wrote:
> In the schema branch I have already added some annotations for Schema.
> However in the future I think we could go even further and allow users
> to pick individual fields out of the row schema. e.g. the user might
> have a Schema with 100 fields, but only want to process userId and geo
> location. I could imagine something like this
>
> @ProcessElement void process(@Field("userId") String
> userId, @Field("latitude") double lat, @Field("longitude") double long) {
> }
>
> And Beam could automatically extract the right fields for the user. In
> fact we could do the same thing with KVs today - supplying annotations
> to automatically unpack the KV.
>
> I do think there are a few nice ways to do side inputs as well, but it's
> more work to design implement which is why I left it off (and given that
> there is some design work, side input annotations should be discussed on
> the dev list before implementation IMO).
>
> Reuven
>
> On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi Reuven,
>
> That's a great improvement for user.
>
> I don't see an easy way to have annotation about side input/output.
> I think we can also plan some extension annotation about schema. Like
> @Element(schema = foo) in addition of the type. Thoughts ?
>
> Regards
> JB
>
> On 04/06/2018 16:06, Reuven Lax wrote:
> > Beam was created with an annotation-based processing API, that allows
> > the framework to automatically inject parameters to a DoFn's process
> > method (and also allows the user to mark any method as the process
> > method using @ProcessElement). However, these annotations were never
> > completed. A specific set of parameters could be injected (e.g. the
> > window or PipelineOptions), but for anything else you had to access it
> > through the ProcessContext. This limited the readability advantage of
> > this API.
> >
> > A couple of months ago I spent a bit of time extending the set of
> > annotations allowed. In particular, the most common uses of
> > ProcessContext were accessing the input element and outputting
> elements,
> > and both of those can now be done without ProcessContext. Example
> usage:
> >
> > new DoFn<InputT, OutputT>() {
> > @ProcessElement process(@Element InputT element,
> > OutputReceiver<OutputT> out) {
> > out.output(convertInputToOutput(element));
> > }
> > }
> >
> > No need for ProcessContext anywhere in this DoFn! The Beam framework
> > also does type checking - if the @Element type was not InputT, you
> would
> > have seen an error. Multi-output DoFns also work, using a
> > MultiOutputReceiver interface.
> >
> > I'll update the Beam docs later with this information, but most
> > information accessible from ProcessContext, OnTimerContext,
> > StartBundleContext, or FinishBundleContext can now be accessed via
> this
> > sort of injection. The main exceptions are side inputs and output from
> > finishbundle, both of which still require the context objects;
> however I
> > hope to find time to provide direct access to those as well.
> >
> > pr/5331 (in progress) converts most of Beam's built-in transforms
> to use
> > this clearer style.
> >
> > Reuven
>
> --
> Jean-Baptiste Onofré
> [email protected] <mailto:[email protected]>
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com