The current state is that it works, and a large amount of testing is
being added [1], but the public API is still in flux (especially the
java-as-callee side [2], and the specification of dependencies [3,4]).
It is being actively worked on though.

[1] https://github.com/apache/beam/pull/10051
[2] 
https://lists.apache.org/thread.html/d7a7fac2615ea15dbd9e66b1fb02a95bc125f5a4f8a897acc40fe408%40%3Cdev.beam.apache.org%3E
[3] 
https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d%40%3Cdev.beam.apache.org%3E
[4] 
https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog/edit?usp=sharing

On Tue, Jan 21, 2020 at 2:31 AM Ismaël Mejía <[email protected]> wrote:
>
> Hello, we are synced I was exactly back to need that same functionality. Last 
> time I checked (end november 2019) there were still many things that were not 
> there. First the External transform is not yet correctly exposed to SDK users 
> (see the previous discussion [1] and Jira ticket BEAM-8546 [2]).
>
> I also hit file staging issues, I am not sure yet if those were my problem or 
> something that should be fixed too but I will probably take a look at this 
> soon. Max, Heejong or anyone more familiar with cross-language pipelines has 
> info on progress in this area?
>
> [1] 
> https://lists.apache.org/thread.html/28f44041748deff8a587a149b4fcf0a8d13d219b32c5063979072474%40%3Cdev.beam.apache.org%3E
> [2] https://issues.apache.org/jira/browse/BEAM-8546
>
>
> On Tue, Jan 21, 2020 at 10:18 AM Michał Walenia <[email protected]> 
> wrote:
>>
>> Is using Python from Java via ExternalTransform working and tested?
>>
>> On Tue, Jan 21, 2020 at 6:50 AM Reza Rokni <[email protected]> wrote:
>>>
>>> +1 for using cross language transforms.
>>>
>>> On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski 
>>>> <[email protected]> wrote:
>>>>>
>>>>> Based on your feedback, I think it'd be fine to deal with the problem as 
>>>>> follows:
>>>>> * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai`
>>>>> * for Java: create a `google-cloud-platform-ai` module in 
>>>>> `sdks/java/extensions` folder
>>>>>
>>>>> As for cross language, we expect those transforms to be quite simple, so 
>>>>> the cost of implementing them twice is not that high.
>>>>
>>>>
>>>> One option would be to implement inference in a library like tfx_bsl [1]. 
>>>> It comes with a generalized Beam transform that can do inference either 
>>>> from a saved model file or by using a service endpoint. The service 
>>>> endpoint API option is there and could support cloud AI APIs. If we 
>>>> utilize tfx_bsl, we will leverage the existing TFX integration and would 
>>>> avoid creating a parallel set of transforms. Then for Java, we could 
>>>> enable the same interface with cross language transform and offer a 
>>>> unified inference API for both languages.
>>>>
>>>> [1] 
>>>> https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78
>>>>
>>>>
>>>>>
>>>>>
>>>>> Thanks for your input,
>>>>> Kamil
>>>>>
>>>>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <[email protected]> wrote:
>>>>>>
>>>>>> If it's in Java also be careful to align with the current google cloud 
>>>>>> IO's, certainly it's dependencies. The google IO's are not depending on 
>>>>>> the the newest client libraries and that's something we're sometimes 
>>>>>> struggling with when we depend on our own client libraries. So make sure 
>>>>>> to align them.
>>>>>>
>>>>>> Also note that although gRPC is vendored, the google IO's do still have 
>>>>>> their own dependency on gRPC and this is the biggest reason for trouble.
>>>>>>
>>>>>>  _/
>>>>>> _/ Alex Van Boxel
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <[email protected]> wrote:
>>>>>>>
>>>>>>> It depends on what language the client libraries are exposed in. For 
>>>>>>> example, if the client libraries are in Java, sdks/java/extensions 
>>>>>>> makes sense while if its Python then integrating it within the gcp 
>>>>>>> extension within sdks/python/apache_beam makes sense.
>>>>>>>
>>>>>>> Adding additional dependencies is ok depending on the licensing and the 
>>>>>>> process is slightly different for each language.
>>>>>>>
>>>>>>> For transforms that are complicated, there is a cross language effort 
>>>>>>> going on so that one can execute one language's transforms within 
>>>>>>> another languages pipeline which may remove the need to write the 
>>>>>>> transforms more then once.
>>>>>>>
>>>>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Nice idea, IO looks like a good place for them but there is another 
>>>>>>>> path that could fit this case: `sdks/java/extensions`, some module 
>>>>>>>> like `google-cloud-platform-ai` in that folder or something like that, 
>>>>>>>> no?
>>>>>>>>
>>>>>>>> In any case great initiative. +1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski 
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We’d like to implement a set of PTransforms that would allow users to 
>>>>>>>>> use some of the Google Cloud AI services in Beam pipelines.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the full list of services and functionalities we’d like to 
>>>>>>>>> integrate Beam with:
>>>>>>>>>
>>>>>>>>> * Video Intelligence [1]
>>>>>>>>>
>>>>>>>>> * Cloud Natural Language [2]
>>>>>>>>>
>>>>>>>>> * Cloud AI Platform Prediction [3]
>>>>>>>>>
>>>>>>>>> * Data Masking/Tokenization [4]
>>>>>>>>>
>>>>>>>>> * Inspecting image data for sensitive information using Cloud Vision 
>>>>>>>>> [5]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> However, we're not sure whether to put those transforms directly into 
>>>>>>>>> Beam, because they would require some additional GCP dependencies. 
>>>>>>>>> One of our ideas is a separate library, that depends on Beam and that 
>>>>>>>>> can be installed optionally, stored somewhere in the beam repository 
>>>>>>>>> (e.g. in the BEAM_ROOT/extras directory). Do you think it is a 
>>>>>>>>> reasonable approach? Or maybe it is totally fine to put them into 
>>>>>>>>> SDKs, just like other IOs?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Kamil
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>>>>>
>>>>>>>>> [2] https://cloud.google.com/natural-language/
>>>>>>>>>
>>>>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>>>>>
>>>>>>>>> [4] 
>>>>>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>>>>>
>>>>>>>>> [5] https://cloud.google.com/vision/
>>>
>>>
>>>
>>> --
>>>
>>> This email may be confidential and privileged. If you received this 
>>> communication by mistake, please don't forward it to anyone else, please 
>>> erase all copies and attachments, and please let me know that it has gone 
>>> to the wrong person.
>>>
>>> The above terms reflect a potential business arrangement, are provided 
>>> solely as a basis for further discussion, and are not intended to be and do 
>>> not constitute a legally binding obligation. No legally binding obligations 
>>> will be created, implied, or inferred until an agreement in final form is 
>>> executed in writing by all parties involved.
>>
>>
>>
>> --
>>
>> Michał Walenia
>> Polidea | Software Engineer
>>
>> M: +48 791 432 002
>> E: [email protected]
>>
>> Unique Tech
>> Check out our projects!

Reply via email to