Hi Davor, On Thu, Jun 16, 2016 at 3:04 AM, Davor Bonaci <[email protected]> wrote:
> This is a really good question, Sergio. You got right away to the crux of > the problem -- how to express such pattern in the Beam model. > > The answer depends whether the data is static, e.g., whether it is known at > pipeline construction time / computed in the earlier stages of the > pipeline, or perhaps evolving during pipeline execution. I'll give a > high-level answer -- feel free to share more information about your use > case and we can drill into specific details. > Well, as a said, for us is more interesting to use Beam in processing time that for training purposes. In the past we have experimented a bit with approaches like TensorSpark <https://github.com/adatao/tensorspark>, but the critical aspect is exploitation of the models. Therefore we could assume the models are static data. > In the simplest case, Beam supports "files to stage" concept if the data is > known apriori. In this case, runners will distribute the data to all > workers before computation starts, and your logic can depend on the data > being available locally on each worker. > Oh, cool. Something like that would be more than enough for now. Can you please point me to any documentation or code I could use to play with it? If this is not sufficient, Beam's side inputs are the right primitive. We > support several access patterns for side inputs, including distributed > lookup and various types of caching. This can work really well, > particularly with a well-optimized runner. > Interesting... any (early) documentation (or code) about such feature? > Other alternatives typically include access to a shared storage, which is a > lower-level approach and often requires more work. Sure, share-storage is always an option, but for many reasons I'd rather not resort to such approach. Thanks so much for all the ideas and valuable discussions! Cheers, -- Sergio Fernández Partner Technology Manager Redlink GmbH m: +43 6602747925 e: [email protected] w: http://redlink.co
