Hello, we are synced I was exactly back to need that same functionality. Last time I checked (end november 2019) there were still many things that were not there. First the External transform is not yet correctly exposed to SDK users (see the previous discussion [1] and Jira ticket BEAM-8546 [2]).
I also hit file staging issues, I am not sure yet if those were my problem or something that should be fixed too but I will probably take a look at this soon. Max, Heejong or anyone more familiar with cross-language pipelines has info on progress in this area? [1] https://lists.apache.org/thread.html/28f44041748deff8a587a149b4fcf0a8d13d219b32c5063979072474%40%3Cdev.beam.apache.org%3E [2] https://issues.apache.org/jira/browse/BEAM-8546 On Tue, Jan 21, 2020 at 10:18 AM Michał Walenia <[email protected]> wrote: > Is using Python from Java via ExternalTransform working and tested? > > On Tue, Jan 21, 2020 at 6:50 AM Reza Rokni <[email protected]> wrote: > >> +1 for using cross language transforms. >> >> On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <[email protected]> wrote: >> >>> >>> >>> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski < >>> [email protected]> wrote: >>> >>>> Based on your feedback, I think it'd be fine to deal with the problem >>>> as follows: >>>> * for Python: put the transforms into >>>> `sdks/python/apache_beam/io/gcp/ai` >>>> * for Java: create a `google-cloud-platform-ai` module in >>>> `sdks/java/extensions` folder >>>> >>>> As for cross language, we expect those transforms to be quite simple, >>>> so the cost of implementing them twice is not that high. >>>> >>> >>> One option would be to implement inference in a library like tfx_bsl >>> [1]. It comes with a generalized Beam transform that can do inference >>> either from a saved model file or by using a service endpoint. The service >>> endpoint API option is there and could support cloud AI APIs. If we utilize >>> tfx_bsl, we will leverage the existing TFX integration and would avoid >>> creating a parallel set of transforms. Then for Java, we could enable the >>> same interface with cross language transform and offer a unified inference >>> API for both languages. >>> >>> [1] >>> https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78 >>> >>> >>> >>>> >>>> Thanks for your input, >>>> Kamil >>>> >>>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <[email protected]> >>>> wrote: >>>> >>>>> If it's in Java also be careful to align with the current google cloud >>>>> IO's, certainly it's dependencies. The google IO's are not depending on >>>>> the >>>>> the newest client libraries and that's something we're sometimes >>>>> struggling >>>>> with when we depend on our own client libraries. So make sure to align >>>>> them. >>>>> >>>>> Also note that although gRPC is vendored, the google IO's do still >>>>> have their own dependency on gRPC and this is the biggest reason for >>>>> trouble. >>>>> >>>>> _/ >>>>> _/ Alex Van Boxel >>>>> >>>>> >>>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <[email protected]> wrote: >>>>> >>>>>> It depends on what language the client libraries are exposed in. For >>>>>> example, if the client libraries are in Java, sdks/java/extensions makes >>>>>> sense while if its Python then integrating it within the gcp extension >>>>>> within sdks/python/apache_beam makes sense. >>>>>> >>>>>> Adding additional dependencies is ok depending on the licensing and >>>>>> the process is slightly different for each language. >>>>>> >>>>>> For transforms that are complicated, there is a cross language effort >>>>>> going on so that one can execute one language's transforms within another >>>>>> languages pipeline which may remove the need to write the transforms more >>>>>> then once. >>>>>> >>>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Nice idea, IO looks like a good place for them but there is another >>>>>>> path that could fit this case: `sdks/java/extensions`, some module like >>>>>>> `google-cloud-platform-ai` in that folder or something like that, no? >>>>>>> >>>>>>> In any case great initiative. +1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> We’d like to implement a set of PTransforms that would allow users >>>>>>>> to use some of the Google Cloud AI services in Beam pipelines. >>>>>>>> >>>>>>>> Here's the full list of services and functionalities we’d like to >>>>>>>> integrate Beam with: >>>>>>>> >>>>>>>> * Video Intelligence [1] >>>>>>>> >>>>>>>> * Cloud Natural Language [2] >>>>>>>> >>>>>>>> * Cloud AI Platform Prediction [3] >>>>>>>> >>>>>>>> * Data Masking/Tokenization [4] >>>>>>>> >>>>>>>> * Inspecting image data for sensitive information using Cloud >>>>>>>> Vision [5] >>>>>>>> >>>>>>>> However, we're not sure whether to put those transforms directly >>>>>>>> into Beam, because they would require some additional GCP >>>>>>>> dependencies. One >>>>>>>> of our ideas is a separate library, that depends on Beam and that can >>>>>>>> be >>>>>>>> installed optionally, stored somewhere in the beam repository (e.g. in >>>>>>>> the >>>>>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? >>>>>>>> Or >>>>>>>> maybe it is totally fine to put them into SDKs, just like other IOs? >>>>>>>> >>>>>>>> If you have any other thoughts, do not hesitate to let us know. >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Kamil >>>>>>>> >>>>>>>> [1] https://cloud.google.com/video-intelligence/ >>>>>>>> >>>>>>>> [2] https://cloud.google.com/natural-language/ >>>>>>>> >>>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview >>>>>>>> >>>>>>>> [4] >>>>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming >>>>>>>> >>>>>>>> [5] https://cloud.google.com/vision/ >>>>>>>> >>>>>>> >> >> -- >> >> This email may be confidential and privileged. If you received this >> communication by mistake, please don't forward it to anyone else, please >> erase all copies and attachments, and please let me know that it has gone >> to the wrong person. >> >> The above terms reflect a potential business arrangement, are provided >> solely as a basis for further discussion, and are not intended to be and do >> not constitute a legally binding obligation. No legally binding obligations >> will be created, implied, or inferred until an agreement in final form is >> executed in writing by all parties involved. >> > > > -- > > Michał Walenia > Polidea <https://www.polidea.com/> | Software Engineer > > M: +48 791 432 002 <+48791432002> > E: [email protected] > > Unique Tech > Check out our projects! <https://www.polidea.com/our-work> >
