Re: Some extensions to the DoFn API

Jean-Baptiste Onofré Mon, 04 Jun 2018 07:45:46 -0700

Yup, it makes sense, it's what I had in mind.

In Apache Camel, in a Processor (similar to a DoFn), we can also pass
directly languages to the arguments.


We can imagine something like:

@ProcessElement void process(@json-path("foo") String foo)

@ProcessElement void process(@xpath("//foo") String foo)

or even a expression language (simple/groovy/whatever).

Regards
JB

On 04/06/2018 16:39, Reuven Lax wrote:
> In the schema branch I have already added some annotations for Schema.
> However in the future I think we could go even further and allow users
> to pick individual fields out of the row schema. e.g. the user might
> have a Schema with 100 fields, but only want to process userId and geo
> location. I could imagine something like this
> 
> @ProcessElement void process(@Field("userId") String
> userId, @Field("latitude") double lat, @Field("longitude") double long) {
> }
> 
> And Beam could automatically extract the right fields for the user. In
> fact we could do the same thing with KVs today - supplying annotations
> to automatically unpack the KV.
> 
> I do think there are a few nice ways to do side inputs as well, but it's
> more work to design implement which is why I left it off (and given that
> there is some design work, side input annotations should be discussed on
> the dev list before implementation IMO).
> 
> Reuven
> 
> On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi Reuven,
> 
>     That's a great improvement for user.
> 
>     I don't see an easy way to have annotation about side input/output.
>     I think we can also plan some extension annotation about schema. Like
>     @Element(schema = foo) in addition of the type. Thoughts ?
> 
>     Regards
>     JB
> 
>     On 04/06/2018 16:06, Reuven Lax wrote:
>     > Beam was created with an annotation-based processing API, that allows
>     > the framework to automatically inject parameters to a DoFn's process
>     > method (and also allows the user to mark any method as the process
>     > method using @ProcessElement). However, these annotations were never
>     > completed. A specific set of parameters could be injected (e.g. the
>     > window or PipelineOptions), but for anything else you had to access it
>     > through the ProcessContext. This limited the readability advantage of
>     > this API.
>     >
>     > A couple of months ago I spent a bit of time extending the set of
>     > annotations allowed. In particular, the most common uses of
>     > ProcessContext were accessing the input element and outputting
>     elements,
>     > and both of those can now be done without ProcessContext. Example
>     usage:
>     >
>     > new DoFn<InputT, OutputT>() {
>     >   @ProcessElement process(@Element InputT element,
>     > OutputReceiver<OutputT> out) {
>     >     out.output(convertInputToOutput(element));
>     >   }
>     > }
>     >
>     > No need for ProcessContext anywhere in this DoFn! The Beam framework
>     > also does type checking - if the @Element type was not InputT, you
>     would
>     > have seen an error. Multi-output DoFns also work, using a
>     > MultiOutputReceiver interface.
>     >
>     > I'll update the Beam docs later with this information, but most
>     > information accessible from ProcessContext, OnTimerContext,
>     > StartBundleContext, or FinishBundleContext can now be accessed via
>     this
>     > sort of injection. The main exceptions are side inputs and output from
>     > finishbundle, both of which still require the context objects;
>     however I
>     > hope to find time to provide direct access to those as well.
>     >
>     > pr/5331 (in progress) converts most of Beam's built-in transforms
>     to use
>     > this clearer style.
>     >
>     > Reuven
> 
>     -- 
>     Jean-Baptiste Onofré
>     [email protected] <mailto:[email protected]>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Some extensions to the DoFn API

Reply via email to