On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri <eh...@google.com> wrote:

> Also debugability: collecting logs from each of these systems.
>

Agree.


>
> On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath <chamik...@google.com>
> wrote:
>
>> Thanks Robert.
>>
>> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> Now that we have the FnAPI, I started playing around with support for
>>> cross-language pipelines. This will allow things like IOs to be shared
>>> across all languages, SQL to be invoked from non-Java, TFX tensorflow
>>> transforms to be invoked from non-Python, etc. and I think is the next
>>> step in extending (and taking advantage of) the portability layer
>>> we've developed. These are often composite transforms whose inner
>>> structure depends in non-trivial ways on their configuration.
>>>
>>
>> Some additional benefits of cross-language transforms are given below.
>>
>> (1) Current large collection of Java IO connectors will be become
>> available to other languages.
>> (2) Current Java and Python transforms will be available for Go and any
>> other future SDKs.
>> (3) New transform authors will be able to pick their language of choice
>> and make their transform available to all Beam SDKs. For example, this can
>> be the language the transform author is most familiar with or the only
>> language for which a client library is available for connecting to an
>> external data store.
>>
>>
>>> I created a PR [1] that basically follows the "expand via an external
>>> process" over RPC alternative from the proposals we came up with when
>>> we were discussing this last time [2]. There are still some unknowns,
>>> e.g. how to handle artifacts supplied by an alternative SDK (they
>>> currently must be provided by the environment), but I think this is a
>>> good incremental step forward that will already be useful in a large
>>> number of cases. It would be good to validate the general direction
>>> and I would be interested in any feedback others may have on it.
>>>
>>
>> I think there are multiple semi-dependent problems we have to tackle to
>> reach the final goal of supporting fully-fledged cross-language transforms
>> in Beam. I agree with taking an incremental approach here with overall
>> vision in mind. Some other problems we have to tackle involve following.
>>
>> * Defining a user API that will allow pipelines defined in a SDK X to use
>> transforms defined in SDK Y.
>> * Update various runners to use URN/payload based environment definition
>> [1]
>> * Updating various runners to support starting containers for multiple
>> environments/languages for the same pipeline and supporting executing
>> pipeline steps in containers started for multiple environments.
>>
>
I've been working with +Heejong Lee <heej...@google.com> to add some of the
missing pieces mentioned above.

We created following doc that captures some of the ongoing work related to
cross-language transforms and which will hopefully serve as a knowledge
base for anybody who wish to quickly learn context related to this.
Feel free to refer to this and/or add to this.

https://docs.google.com/document/d/1H3yCyVFI9xYs1jsiF1GfrDtARgWGnLDEMwG5aQIx2AU/edit?usp=sharing



>
>> Thanks,
>> Cham
>>
>> [1]
>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>> - Robert
>>>
>>> [1] https://github.com/apache/beam/pull/7316
>>> [2] https://s.apache.org/beam-mixed-language-pipelines
>>>
>>

Reply via email to