Hi Davor,

On Thu, Jun 16, 2016 at 3:04 AM, Davor Bonaci <[email protected]>
wrote:

> This is a really good question, Sergio. You got right away to the crux of
> the problem -- how to express such pattern in the Beam model.
>
> The answer depends whether the data is static, e.g., whether it is known at
> pipeline construction time / computed in the earlier stages of the
> pipeline, or perhaps evolving during pipeline execution. I'll give a
> high-level answer -- feel free to share more information about your use
> case and we can drill into specific details.
>

Well, as a said, for us is more interesting to use Beam in processing time
that for training purposes. In the past we have experimented a bit with
approaches like TensorSpark <https://github.com/adatao/tensorspark>, but
the critical aspect is exploitation of the models. Therefore we could
assume the models are static data.



> In the simplest case, Beam supports "files to stage" concept if the data is
> known apriori. In this case, runners will distribute the data to all
> workers before computation starts, and your logic can depend on the data
> being available locally on each worker.
>

Oh, cool. Something like that would be more than enough for now. Can you
please point me to any documentation or code I could use to play with it?


If this is not sufficient, Beam's side inputs are the right primitive. We
> support several access patterns for side inputs, including distributed
> lookup and various types of caching. This can work really well,
> particularly with a well-optimized runner.
>

Interesting... any (early) documentation (or code) about such feature?



> Other alternatives typically include access to a shared storage, which is a
> lower-level approach and often requires more work.


Sure, share-storage is always an option, but for many reasons I'd rather
not resort to such approach.

Thanks so much for all the ideas and valuable discussions!

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: [email protected]
w: http://redlink.co

Reply via email to