Ismael - thanks, adding scripting language support to Beam is an awesome
idea and we should absolutely do it.

However I think it the current proposal can be made significantly more
general, and it would merit from a formal design discussion. E.g. a couple
of points I can think of, that seem very important but currently aren't
covered by the PR:
- Having the script return multiple values per element
- Scripting arbitrary user-code callbacks rather than a whole PTransform,
e.g. writing the various lambdas of FileIO.writeDynamic() in a scripting
language
- Integration with Beam SQL
- Specifying dependencies (does this require anything special?)

And less critical but also important or potentially very useful points:
- Support for side inputs and for multiple output tags
- Supporting asynchronous API calls from the script
- Supporting batching multiple elements together

On Fri, Mar 23, 2018 at 12:09 PM Tyler Akidau <taki...@google.com> wrote:

> +1, I like it. Thanks!
>
> On Fri, Mar 23, 2018 at 9:03 AM Ahmet Altay <al...@google.com> wrote:
>
>> Thank you Ismaël, this looks really cool.
>>
>> On Fri, Mar 23, 2018 at 5:33 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi,
>>>
>>> it sounds like a very good extension mechanism to PTransform.
>>>
>>> +1
>>>
>>> Regards
>>> JB
>>>
>>> On 03/23/2018 12:03 PM, Ismaël Mejía wrote:
>>> > This is a really simple proposal to add an extension with transforms
>>> > that package the Java Scripting API )JSR-223) [1] to allow users to
>>> > specialize some transforms via a scripting language. This work was
>>> > initially created by Romain [2] and I just took it with his
>>> > authorization and refined it to make it pass all the Beam validations
>>> > + style. I also added ValueProviders that allow users to template now
>>> > scripts also in Dataflow.
>>> >
>>> > Notice that Dataflow recently added something similar to create really
>>> > simple data movement pipelines [3], so maybe the rest of the community
>>> > can benefit of a similar extension (and eventually dataflow may
>>> > converge to this implementation).
>>> >
>>> > I hope there is interest in this extension, so far we have a
>>> > ScriptingParDo transform to show the idea, hopefully we can expand
>>> > this to other transforms.
>>> >
>>> > For those interested in more details you can check the Jira issue [4]
>>> > and the PR [5].
>>> >
>>> > [1] https://www.jcp.org/en/jsr/detail?id=223
>>> > [2] https://github.com/rmannibucau/beam-jsr223
>>> > [3]
>>> https://cloud.google.com/blog/big-data/2018/03/pre-built-cloud-dataflow-templates-kiss-for-data-movement
>>> > [4] https://issues.apache.org/jira/browse/BEAM-3921
>>> > [5} https://github.com/apache/beam/pull/4944
>>> >
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>>

Reply via email to