The current state is that it works, and a large amount of testing is being added [1], but the public API is still in flux (especially the java-as-callee side [2], and the specification of dependencies [3,4]). It is being actively worked on though.
[1] https://github.com/apache/beam/pull/10051 [2] https://lists.apache.org/thread.html/d7a7fac2615ea15dbd9e66b1fb02a95bc125f5a4f8a897acc40fe408%40%3Cdev.beam.apache.org%3E [3] https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d%40%3Cdev.beam.apache.org%3E [4] https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog/edit?usp=sharing On Tue, Jan 21, 2020 at 2:31 AM Ismaël Mejía <[email protected]> wrote: > > Hello, we are synced I was exactly back to need that same functionality. Last > time I checked (end november 2019) there were still many things that were not > there. First the External transform is not yet correctly exposed to SDK users > (see the previous discussion [1] and Jira ticket BEAM-8546 [2]). > > I also hit file staging issues, I am not sure yet if those were my problem or > something that should be fixed too but I will probably take a look at this > soon. Max, Heejong or anyone more familiar with cross-language pipelines has > info on progress in this area? > > [1] > https://lists.apache.org/thread.html/28f44041748deff8a587a149b4fcf0a8d13d219b32c5063979072474%40%3Cdev.beam.apache.org%3E > [2] https://issues.apache.org/jira/browse/BEAM-8546 > > > On Tue, Jan 21, 2020 at 10:18 AM Michał Walenia <[email protected]> > wrote: >> >> Is using Python from Java via ExternalTransform working and tested? >> >> On Tue, Jan 21, 2020 at 6:50 AM Reza Rokni <[email protected]> wrote: >>> >>> +1 for using cross language transforms. >>> >>> On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <[email protected]> wrote: >>>> >>>> >>>> >>>> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski >>>> <[email protected]> wrote: >>>>> >>>>> Based on your feedback, I think it'd be fine to deal with the problem as >>>>> follows: >>>>> * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai` >>>>> * for Java: create a `google-cloud-platform-ai` module in >>>>> `sdks/java/extensions` folder >>>>> >>>>> As for cross language, we expect those transforms to be quite simple, so >>>>> the cost of implementing them twice is not that high. >>>> >>>> >>>> One option would be to implement inference in a library like tfx_bsl [1]. >>>> It comes with a generalized Beam transform that can do inference either >>>> from a saved model file or by using a service endpoint. The service >>>> endpoint API option is there and could support cloud AI APIs. If we >>>> utilize tfx_bsl, we will leverage the existing TFX integration and would >>>> avoid creating a parallel set of transforms. Then for Java, we could >>>> enable the same interface with cross language transform and offer a >>>> unified inference API for both languages. >>>> >>>> [1] >>>> https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78 >>>> >>>> >>>>> >>>>> >>>>> Thanks for your input, >>>>> Kamil >>>>> >>>>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <[email protected]> wrote: >>>>>> >>>>>> If it's in Java also be careful to align with the current google cloud >>>>>> IO's, certainly it's dependencies. The google IO's are not depending on >>>>>> the the newest client libraries and that's something we're sometimes >>>>>> struggling with when we depend on our own client libraries. So make sure >>>>>> to align them. >>>>>> >>>>>> Also note that although gRPC is vendored, the google IO's do still have >>>>>> their own dependency on gRPC and this is the biggest reason for trouble. >>>>>> >>>>>> _/ >>>>>> _/ Alex Van Boxel >>>>>> >>>>>> >>>>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <[email protected]> wrote: >>>>>>> >>>>>>> It depends on what language the client libraries are exposed in. For >>>>>>> example, if the client libraries are in Java, sdks/java/extensions >>>>>>> makes sense while if its Python then integrating it within the gcp >>>>>>> extension within sdks/python/apache_beam makes sense. >>>>>>> >>>>>>> Adding additional dependencies is ok depending on the licensing and the >>>>>>> process is slightly different for each language. >>>>>>> >>>>>>> For transforms that are complicated, there is a cross language effort >>>>>>> going on so that one can execute one language's transforms within >>>>>>> another languages pipeline which may remove the need to write the >>>>>>> transforms more then once. >>>>>>> >>>>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <[email protected]> wrote: >>>>>>>> >>>>>>>> Nice idea, IO looks like a good place for them but there is another >>>>>>>> path that could fit this case: `sdks/java/extensions`, some module >>>>>>>> like `google-cloud-platform-ai` in that folder or something like that, >>>>>>>> no? >>>>>>>> >>>>>>>> In any case great initiative. +1 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski >>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> >>>>>>>>> We’d like to implement a set of PTransforms that would allow users to >>>>>>>>> use some of the Google Cloud AI services in Beam pipelines. >>>>>>>>> >>>>>>>>> >>>>>>>>> Here's the full list of services and functionalities we’d like to >>>>>>>>> integrate Beam with: >>>>>>>>> >>>>>>>>> * Video Intelligence [1] >>>>>>>>> >>>>>>>>> * Cloud Natural Language [2] >>>>>>>>> >>>>>>>>> * Cloud AI Platform Prediction [3] >>>>>>>>> >>>>>>>>> * Data Masking/Tokenization [4] >>>>>>>>> >>>>>>>>> * Inspecting image data for sensitive information using Cloud Vision >>>>>>>>> [5] >>>>>>>>> >>>>>>>>> >>>>>>>>> However, we're not sure whether to put those transforms directly into >>>>>>>>> Beam, because they would require some additional GCP dependencies. >>>>>>>>> One of our ideas is a separate library, that depends on Beam and that >>>>>>>>> can be installed optionally, stored somewhere in the beam repository >>>>>>>>> (e.g. in the BEAM_ROOT/extras directory). Do you think it is a >>>>>>>>> reasonable approach? Or maybe it is totally fine to put them into >>>>>>>>> SDKs, just like other IOs? >>>>>>>>> >>>>>>>>> >>>>>>>>> If you have any other thoughts, do not hesitate to let us know. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Kamil >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] https://cloud.google.com/video-intelligence/ >>>>>>>>> >>>>>>>>> [2] https://cloud.google.com/natural-language/ >>>>>>>>> >>>>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview >>>>>>>>> >>>>>>>>> [4] >>>>>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming >>>>>>>>> >>>>>>>>> [5] https://cloud.google.com/vision/ >>> >>> >>> >>> -- >>> >>> This email may be confidential and privileged. If you received this >>> communication by mistake, please don't forward it to anyone else, please >>> erase all copies and attachments, and please let me know that it has gone >>> to the wrong person. >>> >>> The above terms reflect a potential business arrangement, are provided >>> solely as a basis for further discussion, and are not intended to be and do >>> not constitute a legally binding obligation. No legally binding obligations >>> will be created, implied, or inferred until an agreement in final form is >>> executed in writing by all parties involved. >> >> >> >> -- >> >> Michał Walenia >> Polidea | Software Engineer >> >> M: +48 791 432 002 >> E: [email protected] >> >> Unique Tech >> Check out our projects!
