I see. Then the doc I was reading might not be it either. I will ask maosong then.
Thanks for the info! On Mon, May 7, 2018 at 9:23 AM, Bill Graham <billgra...@gmail.com> wrote: > Yeah, that's not it. The stateful scaling part of that doc got lengthy > enough that we broke it into a doc of it's own, per Sanjeev's suggestion > IIRC. The fact that I can't locate it makes me think it was a twitter doc > of mine (although it was not Twitter-specific), which I'm sure was shared > with Sanjeev, Maosong and probably Karthik. If you can find it, please > share. > > On Mon, May 7, 2018 at 12:41 AM, Ning Wang <wangnin...@gmail.com> wrote: > >> Thanks Karthik. The doc is not exactly the same but close enough. >> >> It seems my doc is an internal one so let's use your doc as reference. I >> will see if there is any major differences and comment. >> >> On Sun, May 6, 2018 at 1:22 PM, Karthik Ramasamy <kart...@streaml.io> >> wrote: >> >>> Here it is >>> >>> https://docs.google.com/document/d/1YDFNvLTX6Sg3WDrNFKiWLaJv >>> uEtK4eyxEaA0w9cVlG4/edit#heading=h.d6uy2uxfs2xq >>> >>> cheers >>> /karthik >>> >>> >>> On Sun, May 6, 2018 at 8:20 AM, Bill Graham <billgra...@gmail.com> >>> wrote: >>> >>>> Can you share the doc please? >>>> >>>> On Sat, May 5, 2018 at 4:18 PM Ning Wang <wangnin...@gmail.com> wrote: >>>> >>>> > Thanks. >>>> > >>>> > Yeah I have read the design doc. It has a section for scaling and >>>> covers >>>> > some designs but not reaching this level of details I am afraid. >>>> > >>>> > On Sat, May 5, 2018 at 9:45 AM, Bill Graham <billgra...@gmail.com> >>>> wrote: >>>> > >>>> >> The stateful processing design included a large section on scaling, >>>> which >>>> >> was intended to be done as a future phase. It's very similar to >>>> what's >>>> >> being described. Sanjeev and I worked on it about a 1.5 years ago >>>> with >>>> >> Maosong and it was in a google doc. Sanjeev do you have that design >>>> doc? I >>>> >> can't seem locate it. >>>> >> >>>> >> On Sat, May 5, 2018 at 12:03 AM, Ning Wang <wangnin...@gmail.com> >>>> wrote: >>>> >> >>>> >> > If we go this way, we need key -> state map for each component so >>>> that >>>> >> the >>>> >> > state data can be repartitioned. >>>> >> > >>>> >> > On Fri, May 4, 2018 at 11:44 PM, Karthik Ramasamy < >>>> kart...@streaml.io> >>>> >> > wrote: >>>> >> > >>>> >> > > Instead - if it references >>>> >> > > >>>> >> > > topology name + component name + key range >>>> >> > > >>>> >> > > will it be better? >>>> >> > > >>>> >> > > cheers >>>> >> > > /karthik >>>> >> > > >>>> >> > > >>>> >> > > On Fri, May 4, 2018 at 11:23 PM, Ning Wang <wangnin...@gmail.com >>>> > >>>> >> wrote: >>>> >> > > >>>> >> > > > Currently I think each Instance serializes the state object >>>> into a >>>> >> byte >>>> >> > > > array and checkpoint manager saves the byte array into a file. >>>> The >>>> >> file >>>> >> > > is >>>> >> > > > referenced by topology name + component name + instance id. >>>> >> > > > >>>> >> > > > On Fri, May 4, 2018 at 11:10 PM, Karthik Ramasamy < >>>> >> kart...@streaml.io> >>>> >> > > > wrote: >>>> >> > > > >>>> >> > > > > I am not sure I understand why the state is tied to an >>>> instance? >>>> >> > > > > >>>> >> > > > > cheers >>>> >> > > > > /karthik >>>> >> > > > > >>>> >> > > > > On Fri, May 4, 2018 at 4:36 PM, Thomas Cooper < >>>> >> > tom.n.coo...@gmail.com> >>>> >> > > > > wrote: >>>> >> > > > > >>>> >> > > > > > Yeah, state recovery is a bit more difficult with Heron's >>>> >> > > architecture. >>>> >> > > > > In >>>> >> > > > > > Storm, the task IDs are not just values used for routing >>>> they >>>> >> > > actually >>>> >> > > > > > equate to a task instance within the executor. An executor >>>> which >>>> >> > > > > currently >>>> >> > > > > > processes the keys 4-8 actually contains 5 task instances >>>> of the >>>> >> > same >>>> >> > > > > > component. So for each task, they just save its state >>>> attached >>>> >> to >>>> >> > the >>>> >> > > > > > single task ID and reassemble executors with the new task >>>> >> > instances. >>>> >> > > > > > >>>> >> > > > > > We don't want or have to do that with Heron instances but >>>> we >>>> >> would >>>> >> > > need >>>> >> > > > > to >>>> >> > > > > > have some way to have a state change tied to the task (or >>>> >> routing >>>> >> > key >>>> >> > > > if >>>> >> > > > > we >>>> >> > > > > > go to the key range idea). For something like a word count >>>> you >>>> >> > might >>>> >> > > > save >>>> >> > > > > > counts using a nested map like: { routing key : {word : >>>> count >>>> >> }}. >>>> >> > The >>>> >> > > > > > routing key could be included in the Tuple instance. >>>> However, >>>> >> > whether >>>> >> > > > > this >>>> >> > > > > > pattern would work for more generic state cases I don't >>>> know? >>>> >> > > > > > >>>> >> > > > > > Tom Cooper >>>> >> > > > > > W: www.tomcooper.org.uk | Twitter: @tomncooper >>>> >> > > > > > <https://twitter.com/tomncooper> >>>> >> > > > > > >>>> >> > > > > > >>>> >> > > > > > On Fri, 4 May 2018 at 15:54, Neng Lu <freen...@gmail.com> >>>> >> wrote: >>>> >> > > > > > >>>> >> > > > > > > +1 for this idea. As long as the predefined key space is >>>> large >>>> >> > > > enough, >>>> >> > > > > it >>>> >> > > > > > > should work for most of the cases. >>>> >> > > > > > > >>>> >> > > > > > > Based on my experience with topologies, I never saw one >>>> >> component >>>> >> > > has >>>> >> > > > > > more >>>> >> > > > > > > than 1000 instances in a topology. >>>> >> > > > > > > >>>> >> > > > > > > For recovering states from an update, there will be some >>>> >> problems >>>> >> > > > > though. >>>> >> > > > > > > Since the states stored in heron are strongly connected >>>> with >>>> >> each >>>> >> > > > > > instance, >>>> >> > > > > > > we either need to have >>>> >> > > > > > > some resolver does the state repartitioning or stores >>>> states >>>> >> with >>>> >> > > the >>>> >> > > > > key >>>> >> > > > > > > instead of with each instance. >>>> >> > > > > > > >>>> >> > > > > > > >>>> >> > > > > > > >>>> >> > > > > > > On Fri, May 4, 2018 at 3:01 PM, Karthik Ramasamy < >>>> >> > > > kramas...@gmail.com> >>>> >> > > > > > > wrote: >>>> >> > > > > > > >>>> >> > > > > > > > Thanks for sharing. I like the Storm approach >>>> >> > > > > > > > >>>> >> > > > > > > > - keeps the implementation simpler >>>> >> > > > > > > > - state is deterministic across restarts >>>> >> > > > > > > > - makes it easy to reason and debug >>>> >> > > > > > > > >>>> >> > > > > > > > The hard limit is not a problem at all since most of >>>> the >>>> >> > > topologies >>>> >> > > > > > will >>>> >> > > > > > > > be never that big. >>>> >> > > > > > > > If you can handle Twitter topologies cleanly, it is >>>> more >>>> >> that >>>> >> > > > > > sufficient >>>> >> > > > > > > I >>>> >> > > > > > > > believe. >>>> >> > > > > > > > >>>> >> > > > > > > > cheers >>>> >> > > > > > > > /karthik >>>> >> > > > > > > > >>>> >> > > > > > > > > On May 4, 2018, at 2:31 PM, Thomas Cooper < >>>> >> > > > tom.n.coo...@gmail.com> >>>> >> > > > > > > > wrote: >>>> >> > > > > > > > > >>>> >> > > > > > > > > Hi all, >>>> >> > > > > > > > > >>>> >> > > > > > > > > A while ago I emailed about the issue of how fields >>>> (key) >>>> >> > > grouped >>>> >> > > > > > > routing >>>> >> > > > > > > > > in Heron was not consistent across an update and how >>>> this >>>> >> > makes >>>> >> > > > > > > > preserving >>>> >> > > > > > > > > state across an update very difficult and also makes >>>> it >>>> >> > > > > > > > > difficult/impossible to analyse or predict tuple >>>> flows >>>> >> > through >>>> >> > > a >>>> >> > > > > > > > > current/proposed topology physical plan. >>>> >> > > > > > > > > >>>> >> > > > > > > > > I suggested adopting Storms approach of pre-defining >>>> a >>>> >> > routing >>>> >> > > > key >>>> >> > > > > > > > > space for each component (eg 0-999), so that instead >>>> of an >>>> >> > > > instance >>>> >> > > > > > > > having >>>> >> > > > > > > > > a single task id that gets reset at every update (eg >>>> 10) >>>> >> it >>>> >> > > has a >>>> >> > > > > > range >>>> >> > > > > > > > of >>>> >> > > > > > > > > id's (eg 10-16) that changes depending on the >>>> parallelism >>>> >> of >>>> >> > > the >>>> >> > > > > > > > component. >>>> >> > > > > > > > > This has the advantage that a key will always hash >>>> to the >>>> >> > same >>>> >> > > > task >>>> >> > > > > > ID >>>> >> > > > > > > > for >>>> >> > > > > > > > > the lifetime of the topology. Meaning recovering >>>> state >>>> >> for an >>>> >> > > > > > instance >>>> >> > > > > > > > > after a crash or update is just a case of pulling the >>>> >> state >>>> >> > > > linked >>>> >> > > > > to >>>> >> > > > > > > the >>>> >> > > > > > > > > keys in its task ID range. >>>> >> > > > > > > > > >>>> >> > > > > > > > > I know the above proposal has issues, not least of >>>> all >>>> >> > placing >>>> >> > > a >>>> >> > > > > hard >>>> >> > > > > > > > upper >>>> >> > > > > > > > > limit on the scale out of a component, and that some >>>> >> > > alternative >>>> >> > > > > > ideas >>>> >> > > > > > > > are >>>> >> > > > > > > > > being floated for solving the stateful update issue. >>>> >> > However, I >>>> >> > > > > just >>>> >> > > > > > > > wanted >>>> >> > > > > > > > > to throw some more weight behind the Storm approach. >>>> There >>>> >> > was >>>> >> > > a >>>> >> > > > > > recent >>>> >> > > > > > > > > paper about high-performance network load balancing >>>> >> > > > > > > > > <https://blog.acolyer.org/2018/05/03/stateless- >>>> >> > > > > > > > datacenter-load-balancing-with-beamer/>that >>>> >> > > > > > > > > describes an approach using a fixed key space >>>> similar to >>>> >> > > Storm's >>>> >> > > > > (see >>>> >> > > > > > > the >>>> >> > > > > > > > > section called Stable Hashing - they assign a range >>>> 100x >>>> >> the >>>> >> > > > > expected >>>> >> > > > > > > > > connection pool size - which we could do with heron >>>> to >>>> >> > prevent >>>> >> > > > ever >>>> >> > > > > > > > hitting >>>> >> > > > > > > > > the upper scaling limit). Also, this new load >>>> balancer, >>>> >> > Beamer, >>>> >> > > > > > claims >>>> >> > > > > > > to >>>> >> > > > > > > > > be twice as fast as Google's Maglev >>>> >> > > > > > > > > <https://blog.acolyer.org/2016 >>>> /03/21/maglev-a-fast-and- >>>> >> > > > > > > > reliable-software-network-load-balancer/> >>>> >> > > > > > > > > which again uses a pre-defined keyspace and ID >>>> ranges to >>>> >> > create >>>> >> > > > > > look-up >>>> >> > > > > > > > > tables deterministically. >>>> >> > > > > > > > > >>>> >> > > > > > > > > I know a load balancer is a different beast to a >>>> stream >>>> >> > > grouping >>>> >> > > > > but >>>> >> > > > > > > > there >>>> >> > > > > > > > > are some interesting ideas in those papers (The links >>>> >> point >>>> >> > to >>>> >> > > > > > summary >>>> >> > > > > > > > blog >>>> >> > > > > > > > > posts so you don't have to read the whole paper). >>>> >> > > > > > > > > >>>> >> > > > > > > > > Anyway, I just thought I would those papers out >>>> there and >>>> >> see >>>> >> > > > what >>>> >> > > > > > > people >>>> >> > > > > > > > > think. >>>> >> > > > > > > > > >>>> >> > > > > > > > > Tom Cooper >>>> >> > > > > > > > > W: www.tomcooper.org.uk | Twitter: @tomncooper >>>> >> > > > > > > > > <https://twitter.com/tomncooper> >>>> >> > > > > > > > >>>> >> > > > > > > > >>>> >> > > > > > > >>>> >> > > > > > >>>> >> > > > > >>>> >> > > > >>>> >> > > >>>> >> > >>>> >> >>>> > >>>> > -- >>>> Sent from Gmail Mobile >>>> >>> >>> >> >