Re: Streaming - lookup against reference data

2016-09-15 Thread Tom Davis
Thanks Jörn, sounds like there's nothing obvious I'm missing, which is encouraging. I've not used Redis, but it does seem that for most of my current and likely future use-cases it would be the best fit (nice compromise of scale and easy setup / access). Thanks, Tom On Wed, Sep 14, 2016 at

Re: Streaming - lookup against reference data

2016-09-14 Thread Jörn Franke
Hmm is it just a lookup and the values are small? I do not think that in this case redis needs to be installed on each worker node. Redis has a rather efficient protocol. Hence one or a few dedicated redis nodes probably fit your purpose more then needed. Just try to reuse connections and do

Streaming - lookup against reference data

2016-09-14 Thread Tom Davis
Hi all, Interested in patterns people use in the wild for lookup against reference data sets from a Spark streaming job. The reference dataset will be updated during the life of the job (although being 30mins out of date wouldn't be an issue, for example). So far I have come up with a few