On Mon, Feb 22, 2010 at 1:53 PM, Ryan King <r...@twitter.com> wrote: > So, after having some more experience with HH, I've reformed my > opinion. I think we have 3 options: > > 1. Make the natural endpoints responsible for the hints. > 2. Make a random node responsible for hints. > 3. Get rid of HH. > > #1 reduces the "surprising effects in a small cluster" problem by > adding a marginal amount of resource demands to nodes that already > have the data we need. > > #2 will spread the load out. We had a node die last week and decided > to leave it down so that we could learn about the effects of this > situation. We eventually ended up killing the next node on the ring > with all the hints (I think there some improvements to this in 0.6, > but I don't know if they'll be enough). So, even on a large cluster > (ours is currently 45 nodes), HH can have surprising effects on nodes > that neighbor a node that's down. Picking either a random node or > using the coordinator node for the hint would spread the load out. > > #3 is, I think, the right answer. It make our system simpler and it > makes the behavior in failure conditions more predictable and safe.
This is a good summary of the options. Why do you find 3 more compelling than 1? Yes, it's simpler, but 1 would not require a large change to the exiting code, so perhaps we need a better case than that to justify removing a feature that already (mostly) works. -Jonathan