I'll file a JIRA to track the idea. On Mon, Jun 4, 2018 at 5:52 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Exactly, that's why something like @xpath or @json-path could be > interesting. > > Regards > JB > > On 04/06/2018 16:48, Reuven Lax wrote: > > Interesting. And given that Beam Schemas are recursive (a row can > > contain nested rows), we might actually need something like xpath if we > > want to make this fully general. > > > > Reuven > > > > On Mon, Jun 4, 2018 at 5:45 PM Jean-Baptiste Onofré <j...@nanthrax.net > > <mailto:j...@nanthrax.net>> wrote: > > > > Yup, it makes sense, it's what I had in mind. > > > > In Apache Camel, in a Processor (similar to a DoFn), we can also pass > > directly languages to the arguments. > > > > We can imagine something like: > > > > @ProcessElement void process(@json-path("foo") String foo) > > > > @ProcessElement void process(@xpath("//foo") String foo) > > > > or even a expression language (simple/groovy/whatever). > > > > Regards > > JB > > > > On 04/06/2018 16:39, Reuven Lax wrote: > > > In the schema branch I have already added some annotations for > Schema. > > > However in the future I think we could go even further and allow > users > > > to pick individual fields out of the row schema. e.g. the user > might > > > have a Schema with 100 fields, but only want to process userId and > geo > > > location. I could imagine something like this > > > > > > @ProcessElement void process(@Field("userId") String > > > userId, @Field("latitude") double lat, @Field("longitude") double > > long) { > > > } > > > > > > And Beam could automatically extract the right fields for the > user. In > > > fact we could do the same thing with KVs today - supplying > annotations > > > to automatically unpack the KV. > > > > > > I do think there are a few nice ways to do side inputs as well, > > but it's > > > more work to design implement which is why I left it off (and > > given that > > > there is some design work, side input annotations should be > > discussed on > > > the dev list before implementation IMO). > > > > > > Reuven > > > > > > On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré > > <j...@nanthrax.net <mailto:j...@nanthrax.net> > > > <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote: > > > > > > Hi Reuven, > > > > > > That's a great improvement for user. > > > > > > I don't see an easy way to have annotation about side > > input/output. > > > I think we can also plan some extension annotation about > > schema. Like > > > @Element(schema = foo) in addition of the type. Thoughts ? > > > > > > Regards > > > JB > > > > > > On 04/06/2018 16:06, Reuven Lax wrote: > > > > Beam was created with an annotation-based processing API, > > that allows > > > > the framework to automatically inject parameters to a DoFn's > > process > > > > method (and also allows the user to mark any method as the > > process > > > > method using @ProcessElement). However, these annotations > > were never > > > > completed. A specific set of parameters could be injected > > (e.g. the > > > > window or PipelineOptions), but for anything else you had to > > access it > > > > through the ProcessContext. This limited the readability > > advantage of > > > > this API. > > > > > > > > A couple of months ago I spent a bit of time extending the > > set of > > > > annotations allowed. In particular, the most common uses of > > > > ProcessContext were accessing the input element and > outputting > > > elements, > > > > and both of those can now be done without ProcessContext. > > Example > > > usage: > > > > > > > > new DoFn<InputT, OutputT>() { > > > > @ProcessElement process(@Element InputT element, > > > > OutputReceiver<OutputT> out) { > > > > out.output(convertInputToOutput(element)); > > > > } > > > > } > > > > > > > > No need for ProcessContext anywhere in this DoFn! The Beam > > framework > > > > also does type checking - if the @Element type was not > > InputT, you > > > would > > > > have seen an error. Multi-output DoFns also work, using a > > > > MultiOutputReceiver interface. > > > > > > > > I'll update the Beam docs later with this information, but > most > > > > information accessible from ProcessContext, OnTimerContext, > > > > StartBundleContext, or FinishBundleContext can now be > > accessed via > > > this > > > > sort of injection. The main exceptions are side inputs and > > output from > > > > finishbundle, both of which still require the context > objects; > > > however I > > > > hope to find time to provide direct access to those as well. > > > > > > > > pr/5331 (in progress) converts most of Beam's built-in > > transforms > > > to use > > > > this clearer style. > > > > > > > > Reuven > > > > > > -- > > > Jean-Baptiste Onofré > > > jbono...@apache.org <mailto:jbono...@apache.org> > > <mailto:jbono...@apache.org <mailto:jbono...@apache.org>> > > > http://blog.nanthrax.net > > > Talend - http://www.talend.com > > > > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org <mailto:jbono...@apache.org> > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >