On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri <eh...@google.com> wrote: > Also debugability: collecting logs from each of these systems. >
Agree. > > On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath <chamik...@google.com> > wrote: > >> Thanks Robert. >> >> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> Now that we have the FnAPI, I started playing around with support for >>> cross-language pipelines. This will allow things like IOs to be shared >>> across all languages, SQL to be invoked from non-Java, TFX tensorflow >>> transforms to be invoked from non-Python, etc. and I think is the next >>> step in extending (and taking advantage of) the portability layer >>> we've developed. These are often composite transforms whose inner >>> structure depends in non-trivial ways on their configuration. >>> >> >> Some additional benefits of cross-language transforms are given below. >> >> (1) Current large collection of Java IO connectors will be become >> available to other languages. >> (2) Current Java and Python transforms will be available for Go and any >> other future SDKs. >> (3) New transform authors will be able to pick their language of choice >> and make their transform available to all Beam SDKs. For example, this can >> be the language the transform author is most familiar with or the only >> language for which a client library is available for connecting to an >> external data store. >> >> >>> I created a PR [1] that basically follows the "expand via an external >>> process" over RPC alternative from the proposals we came up with when >>> we were discussing this last time [2]. There are still some unknowns, >>> e.g. how to handle artifacts supplied by an alternative SDK (they >>> currently must be provided by the environment), but I think this is a >>> good incremental step forward that will already be useful in a large >>> number of cases. It would be good to validate the general direction >>> and I would be interested in any feedback others may have on it. >>> >> >> I think there are multiple semi-dependent problems we have to tackle to >> reach the final goal of supporting fully-fledged cross-language transforms >> in Beam. I agree with taking an incremental approach here with overall >> vision in mind. Some other problems we have to tackle involve following. >> >> * Defining a user API that will allow pipelines defined in a SDK X to use >> transforms defined in SDK Y. >> * Update various runners to use URN/payload based environment definition >> [1] >> * Updating various runners to support starting containers for multiple >> environments/languages for the same pipeline and supporting executing >> pipeline steps in containers started for multiple environments. >> > I've been working with +Heejong Lee <heej...@google.com> to add some of the missing pieces mentioned above. We created following doc that captures some of the ongoing work related to cross-language transforms and which will hopefully serve as a knowledge base for anybody who wish to quickly learn context related to this. Feel free to refer to this and/or add to this. https://docs.google.com/document/d/1H3yCyVFI9xYs1jsiF1GfrDtARgWGnLDEMwG5aQIx2AU/edit?usp=sharing > >> Thanks, >> Cham >> >> [1] >> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952 >> >> >> >> >> >> >> >> >>> >>> - Robert >>> >>> [1] https://github.com/apache/beam/pull/7316 >>> [2] https://s.apache.org/beam-mixed-language-pipelines >>> >>