On Mon, Feb 22, 2010 at 2:05 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
> On Mon, Feb 22, 2010 at 1:53 PM, Ryan King <r...@twitter.com> wrote:
>> So, after having some more experience with HH, I've reformed my
>> opinion. I think we have 3 options:
>>
>> 1. Make the natural endpoints responsible for the hints.
>> 2. Make a random node responsible for hints.
>> 3. Get rid of HH.
>>
>> #1 reduces the "surprising effects in a small cluster" problem by
>> adding a marginal amount of resource demands to nodes that already
>> have the data we need.
>>
>> #2 will spread the load out. We had a node die last week and decided
>> to leave it down so that we could learn about the effects of this
>> situation. We eventually ended up killing the next node on the ring
>> with all the hints (I think there some improvements to this in 0.6,
>> but I don't know if they'll be enough). So, even on a large cluster
>> (ours is currently 45 nodes), HH can have surprising effects on nodes
>> that neighbor a node that's down. Picking either a random node or
>> using the coordinator node for the hint would spread the load out.
>>
>> #3 is, I think, the right answer. It make our system simpler and it
>> makes the behavior in failure conditions more predictable and safe.
>
> This is a good summary of the options.
>
> Why do you find 3 more compelling than 1?  Yes, it's simpler, but 1
> would not require a large change to the exiting code, so perhaps we
> need a better case than that to justify removing a feature that
> already (mostly) works.

I think I find it more compelling because we're currently experiencing
pain related to HH. I'd be ok with keeping it as long as we can make
the effects of a node down be less drastic.

-ryan

Reply via email to