Interesting. Thanks for sharing~ On Fri, May 4, 2018 at 2:31 PM, Thomas Cooper <tom.n.coo...@gmail.com> wrote:
> Hi all, > > A while ago I emailed about the issue of how fields (key) grouped routing > in Heron was not consistent across an update and how this makes preserving > state across an update very difficult and also makes it > difficult/impossible to analyse or predict tuple flows through a > current/proposed topology physical plan. > > I suggested adopting Storms approach of pre-defining a routing key > space for each component (eg 0-999), so that instead of an instance having > a single task id that gets reset at every update (eg 10) it has a range of > id's (eg 10-16) that changes depending on the parallelism of the component. > This has the advantage that a key will always hash to the same task ID for > the lifetime of the topology. Meaning recovering state for an instance > after a crash or update is just a case of pulling the state linked to the > keys in its task ID range. > > I know the above proposal has issues, not least of all placing a hard upper > limit on the scale out of a component, and that some alternative ideas are > being floated for solving the stateful update issue. However, I just wanted > to throw some more weight behind the Storm approach. There was a recent > paper about high-performance network load balancing > <https://blog.acolyer.org/2018/05/03/stateless-datacenter-load-balancing- > with-beamer/>that > describes an approach using a fixed key space similar to Storm's (see the > section called Stable Hashing - they assign a range 100x the expected > connection pool size - which we could do with heron to prevent ever hitting > the upper scaling limit). Also, this new load balancer, Beamer, claims to > be twice as fast as Google's Maglev > <https://blog.acolyer.org/2016/03/21/maglev-a-fast-and- > reliable-software-network-load-balancer/> > which again uses a pre-defined keyspace and ID ranges to create look-up > tables deterministically. > > I know a load balancer is a different beast to a stream grouping but there > are some interesting ideas in those papers (The links point to summary blog > posts so you don't have to read the whole paper). > > Anyway, I just thought I would those papers out there and see what people > think. > > Tom Cooper > W: www.tomcooper.org.uk | Twitter: @tomncooper > <https://twitter.com/tomncooper> >