On Mon, 16 Jan 2006, Jules Gosnell wrote:
> >2. When an HTTP request arrives, if the cluster which received does not > >have R copies then it blocks (it waits until there are.) This should in > >data centers because partitions are likely to be very short-lived (aka > >virtual partitions, which are due to congestion, not to any hardware > >issue.) > > > > > Interesting. I was intending to actively repopulate the cluster > fragment, as soon as the split was detected. I figure that > - the longer that sessions spend without their full complement of > backups, the more likely that a further failure may result in data loss. > - the split is an exceptional cicumstance at which you would expect to > pay an exceptional cost (regenerating missing primaries from backups and > vice-versa) > > by waiting for a request to arrive for a session before ensuring it has > its correct complement of backups, you extend the time during which it > is 'at risk'. By doing this 'lazily', you will also have to perform an > additional check on every request arrival, which you would not have to > do if you had regenerated missing state at the point that you noticed > the split. Actually I didn't mean to say that you should do it lazily. You most definitely do it aggressively, but I would not try to do _all_ the state transfer ASAP, because this can kill availability. If I had to do the state transfer using totem I would use priority queues, so that you know that while the system is doing state transfer it is still operating at, say, 80% efficiency. It was not about lazy vs. greedy. I believe that if you put some spare capacity in your cluster you will get good availability. For example, if your minimum R is 2 and the normal operating value is 4, when a node fails you will not be frantically doing state transfer. > >3. If at any time an HTTP reaches a server which does not have itself a > >replica of the session it sends a client redirect to a node which does. > > > > > WADI can relocate request to session, as you suggest (via redirect or > proxy), or session to request, by migration. Relocation of request > should scale better since requests are generally smaller and, in the web > tier, may run concurrently through the same session, whereas sessions > are generally larger and may only be migrated serially (since only one > copy at a time may be 'active'). I would also just send a redirect. I don't think it's worth relocating a session. > > and possibly migration of some session for > >proper load balancing. > > > > > forcing the balancing of state around the cluster is something that I > have considered with WADI, but not yet tried to implement. The type of > load-balancer that is being used has a big impact here. If you cannot > communicate a change of session location satisfactorily to the Http load > balancer, then you have to just go with wherever it decides a session is > located.... With SFSBs we should have much more control at the client > side, so this becomes a real option. In my opinion load balancing is not something that a cluster api can address effectively. Half the problem is evaluating how busy the system is in the first place. > all in all, though, it sounds like we see pretty much eye to eye :-) Better than the other way .. > the lazy partition regeneration is an interesting idea and this is the > second time it has been suggested to me, so I will give it some serious > thought. Again, I wasn't advocating lazy state transfer. But perhaps it has applications somewhere. > Thanks for taking the time to share your thoughts, No problem.
