Hi Sergio,

Welcome aboard, and good to discuss with you during ApacheCon.

Distribution of the resources is a point related to runner, and more specifically to the execution environment of the runner. Each runner/backend will implement their own logic.

I don't know Keras enough to provide a strong advice.

Regarding the Python SDK, we discussed about that last week: it's on the way. We should have the Python SDK very soon (we were busy with the first release).

Regards
JB

On 06/14/2016 12:38 PM, Sergio Fernández wrote:
Hi guys,

I'm newbie in the Beam community, but as someone who has used DataFlow in
the past I've been following the podling since you came to ASK. I'm very
happy to see that 0.1.0-incubating is finally going out, congratulations
for such great milestone.

I discussed with some of you guys in the last ApacheCon, and for me was
good to know the Python SDK was just a matter of time and should come to
Beam at some point. So coming back to the original plans <
http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html>,
do you manage any timeline to bring the Python SDK to Beam?

So I'd like to bring a question how Beam plans to deal with the
distribution of resources across all nodes, something I know it not really
clean with some runners (e.g., Spark). More concretely, we're using Keras <
http://keras.io/>, a deep learning Python library that is capable of
running on top of either TensorFlow or Theano. Historically I know DataFlow
and TensorFlow are not very compatible. But I wonder if the project has
already discussed how to support running Keras (TensorFlow) tasks on Beam.
For us is more for querying than for training, so I'd like to know if the
Beam Model could natively support the distribution of the models (sometimes
several GB).

Thanks in advance.

Cheers,


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to