Re: [infinispan-dev] DIST.retrieveFromRemoteSource

Sanne Grinovero Wed, 25 Jan 2012 05:50:20 -0800

On 25 January 2012 13:41, Mircea Markus <[email protected]> wrote:
>
> On 25 Jan 2012, at 13:25, Sanne Grinovero wrote:
>
>> [cut]
>>>> I agree, we should not ask all replicas for the same information.
>>>> Asking only one is the opposite though: I think this should be a
>>>> configuration option to ask for any value between (1 and numOwner).
>>>> That's because I understand it might be beneficial to ask to more than
>>>> one node immediately,
>>> why is it more beneficial to ask multiple members than a single one? I 
>>> guess it doesn't have to do with consistency, as in that case it would be 
>>> required (vs beneficial).
>>> Is it because one of the nodes might reply faster? I'm not that sure that 
>>> compensates the burden of numOwner-1 additional RPCs, but a benchmark will 
>>> tell us just that.
>>
>> One node might be busy doing GC and stay unresponsive for a whole
>> second or longer, another one might be actually crashed and you didn't
>> know that yet, these are unlikely but possible.
> All these are possible but I would rather consider them as exceptional 
> situations, possibly handled by a retry logic. We should *not* optimise for 
> that these situations IMO.
> Thinking about our last performance results, we have avg 26k    gets per 
> second. Now with numOwners = 2, these means that each node handles 26k 
> *redundant* gets every second: I'm not concerned about the network load, as 
> Bela mentioned in a previous mail the network link should not be the 
> bottleneck, but there's a huge unnecessary activity in OOB threads which 
> should rather be used for releasing locks or whatever needed. On top of that, 
> this consuming activity highly encourages GC pauses, as the effort for a get 
> is practically numOwners higher than it should be.
>
>> More likely, a rehash is in progress, you could then be asking a node
>> which doesn't yet (or anymore) have the value.
>
> this is a consistency issue and I think we can find a way to handle it some 
> other way.
>>
>> All good reasons for which imho it makes sense to send out "a couple"
>> of requests in parallel, but I'd unlikely want to send more than 2,
>> and I agree often 1 might be enough.
>> Maybe it should even optimize for the most common case: send out just
>> one, have a more aggressive timeout and in case of trouble ask for the
>> next node.
> +1
>>
>> In addition, sending a single request might spare us some Future,
>> await+notify messing in terms of CPU cost of sending the request.
> it's the remote OOB thread that's the most costly resource imo.


I think I agree on all points, it makes more sense.
Just that in a large cluster, let's say
1000 nodes, maybe I want 20 owners as a sweet spot for read/write
performance tradeoff, and with such high numbers I guess doing 2-3
gets in parallel might make sense as those "unlikely" events, suddenly
are an almost certain.. especially the rehash in progress.

So I'd propose a separate configuration option for # parallel get
events, and one to define a "try next node" policy. Or this policy
should be the whole strategy, and the #gets one of the options for the
default implementation.

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

Reply via email to