Hi JB,

On Fri, Nov 25, 2016 at 2:36 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:
>
> By the way, you can also use TensorFrame allowing you to use TensorFlow
> directly with Spark dataframe, and more direct access. I discussed with Tim
> Hunter from Databricks about that who's working on TensorFrame.
>

Yes, we have been discussing and experimenting a bit with TensorFrame. The
work is very interesting, although it has some limitations. So actually
that would mean take a step back in our plan of getting away from the
specifics of the concrete processing engine.


Back on Beam, what you could do:
>
> 1. you expose the service on a microservice container (for instance Apache
> Karaf ;))
> In your pipeline, you have two options:
> 2.a. in your Beam pipeline, in a DoFn, in the @Setup you can create the
> REST client (using CXF, or whatever), and in the @ProcessElement you can
> use the service (hosted by Karaf)
>

Besides a different microservice infrastructure, I already started to play
with DoFn and the concepts around.


2.b. I also have a RestIO (source and sink) that can request a REST
> endpoint. However, for now, this IO acts as a pipeline endpoint
> (PTransform<PBegin, PCollection> or PTransform<PCollection, PDone>). In
> your case, if the service called is a step of your pipeline, ParDo(your
> DoFn) would be easier.
>

Yes, that's was what I understood of the Beam design. IO is expected for
the head or the tail of the pipeline.


> Is it what you mean by microservice ?
>

Yep, exactly that.

Thanks so much!






On 11/25/2016 01:18 PM, Sergio Fernández wrote:

> Hi JB,
>
> On Tue, Nov 22, 2016 at 11:14 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>>
>> DoFn will execute per element (with eventually a hook on StartBundle,
>> FinishBundle, and Teardown). It's basic the way it works in IO WriteFn: we
>> create the connection in StartBundle and send each element (with a batch)
>> to external resource.
>>
>> PTransform is maybe more flexible in case of interact with "outside"
>> resources.
>>
>>
> Probably PTransform would be a better place. I'm still pretty new to some
> of the Beam terms and apis.
>
> Do you have use case to be sure I understand ?
>
>
> Yes, Well, it's far more complex, but this question I can simplify it:
>
> We have a TensorFlow-based classifier. In our pipeline one step performs
> that classification of the data. Currently it's implemented as a Spark
> Function, because TensorFlow models can directly be embedded within
> pipelines using PySpark.
>
> Therefore I'm looking for the best option to move such classification
> process one level up in the abstraction with Beam, so I could make it
> portable. The first idea I'm exploring is relying on a external function
> (i.e., microservice) that I'd need to scale up and down independently of
> the pipeline. So I'm more than welcome to discuss ideas ;-)
>
> Thanks.
>
> Cheers,
>
>
>
> On 11/22/2016 10:39 AM, Sergio Fernández wrote:
>>
>> Hi,
>>>
>>> I'd like resume the idea to have TensorFlow-based tasks running in a Beam
>>> Pipeline. So far the cleaner approach I can imagine would be to have it
>>> running outside (Functions in GCP, Lambdas in AWS, Microservices
>>> generally
>>> speaking).
>>>
>>> Therefore, does the current Beam model provide the sense of a DoFn which
>>> actually runs externally?
>>>
>>> Thanks in advance for the feedback.
>>>
>>> Cheers,
>>>
>>>
>>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> --
>> <http://www.talend.com>
>> <http://www.talend.com>
>> Sergio Fernández
>> Partner Technology Manager
>> Redlink GmbH
>> m: +43 6602747925
>> e:  <http://www.talend.com>sergio.fernan...@redlink.co
>> w: http://redlink.co
>>
>>
>
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co

Reply via email to