On Mon, Feb 22, 2010 at 2:05 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Mon, Feb 22, 2010 at 1:53 PM, Ryan King <r...@twitter.com> wrote: >> So, after having some more experience with HH, I've reformed my >> opinion. I think we have 3 options: >> >> 1. Make the natural endpoints responsible for the hints. >> 2. Make a random node responsible for hints. >> 3. Get rid of HH. >> >> #1 reduces the "surprising effects in a small cluster" problem by >> adding a marginal amount of resource demands to nodes that already >> have the data we need. >> >> #2 will spread the load out. We had a node die last week and decided >> to leave it down so that we could learn about the effects of this >> situation. We eventually ended up killing the next node on the ring >> with all the hints (I think there some improvements to this in 0.6, >> but I don't know if they'll be enough). So, even on a large cluster >> (ours is currently 45 nodes), HH can have surprising effects on nodes >> that neighbor a node that's down. Picking either a random node or >> using the coordinator node for the hint would spread the load out. >> >> #3 is, I think, the right answer. It make our system simpler and it >> makes the behavior in failure conditions more predictable and safe. > > This is a good summary of the options. > > Why do you find 3 more compelling than 1? Yes, it's simpler, but 1 > would not require a large change to the exiting code, so perhaps we > need a better case than that to justify removing a feature that > already (mostly) works.
I think I find it more compelling because we're currently experiencing pain related to HH. I'd be ok with keeping it as long as we can make the effects of a node down be less drastic. -ryan