One interesting point that Sergio mentions and that it is getting lost in the discussion is how to integrate other dataflow style frameworks into Beam, e.g. Tensorflow. I am really curious about what the others have to say about this since this is probably one question that will come once more users write Pipelines on Beam. Any ideas on this ? or the solution is just to write some 'integration PTransforms' and that's it ?
Regards, Ismaël ps. I forgot to say Hi and welcome Sergio :). On Wed, Jun 15, 2016 at 11:18 AM, Jean-Baptiste Onofré <[email protected]> wrote: > Not the Beam Model for sure (the Beam Model is about the pipeline design). > > The Beam Runner API can help there, but the final implement is on the > runner itself. > > Regards > JB > > > On 06/15/2016 10:18 AM, Sergio Fernández wrote: > >> Hi Jean-Baptiste, >> >> On Tue, Jun 14, 2016 at 12:45 PM, Jean-Baptiste Onofré <[email protected]> >> wrote: >> >>> >>> Welcome aboard, and good to discuss with you during ApacheCon. >>> >>> >> Was nice to put you all faces ;-) >> >> >> Distribution of the resources is a point related to runner, and more >>> specifically to the execution environment of the runner. Each >>> runner/backend will implement their own logic. >>> >>> >> Yes, I can understand. But I wonder if the Beam Model provides any >> primitive to deal with such aspects in an abstract way. I guess I'd need >> to >> go deeper into Beam to approach you with more concrete questions; so for >> now it's fine. >> >> Regarding the Python SDK, we discussed about that last week: it's on the >> >>> way. We should have the Python SDK very soon (we were busy with the first >>> release). >>> >> >> >> Yep, I knew that was the plan. It's really cool to have it already is >> master to the next release :-) >> >> Thanks. >> >> >> >> >> >>> On 06/14/2016 12:38 PM, Sergio Fernández wrote: >>> >>> Hi guys, >>>> >>>> I'm newbie in the Beam community, but as someone who has used DataFlow >>>> in >>>> the past I've been following the podling since you came to ASK. I'm very >>>> happy to see that 0.1.0-incubating is finally going out, congratulations >>>> for such great milestone. >>>> >>>> I discussed with some of you guys in the last ApacheCon, and for me was >>>> good to know the Python SDK was just a matter of time and should come to >>>> Beam at some point. So coming back to the original plans < >>>> >>>> >>>> http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html >>>> >>>>> , >>>>> >>>> do you manage any timeline to bring the Python SDK to Beam? >>>> >>>> So I'd like to bring a question how Beam plans to deal with the >>>> distribution of resources across all nodes, something I know it not >>>> really >>>> clean with some runners (e.g., Spark). More concretely, we're using >>>> Keras >>>> < >>>> http://keras.io/>, a deep learning Python library that is capable of >>>> running on top of either TensorFlow or Theano. Historically I know >>>> DataFlow >>>> and TensorFlow are not very compatible. But I wonder if the project has >>>> already discussed how to support running Keras (TensorFlow) tasks on >>>> Beam. >>>> For us is more for querying than for training, so I'd like to know if >>>> the >>>> Beam Model could natively support the distribution of the models >>>> (sometimes >>>> several GB). >>>> >>>> Thanks in advance. >>>> >>>> Cheers, >>>> >>>> >>>> -- >>> Jean-Baptiste Onofré >>> [email protected] >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>> >> >> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
