Hi Tirthankar,

>     let_partition_wait is to return true if the node running it is in
>     the "smaller" partition, right?
> 
> Yes where the definition of "smaller" changes according to the number of 
> nodes configured. i.e. if n is the number of nodes configured, a smaller 
> partition may be much less than n/2.
> [...]
> In your code, the definition of "large" is fixed. In my code, the 
> definition of "small" is variable.

Agree. But I would prefer a closed form as the definition of what is considered 
a small/large partition. The numbers you have chosen seem arbitrary to a 
certain extent and I believe it would be hard to show that exactly those 
numbers are a good choice. If you could come up with a formula and some good 
reasoning behind it, this should be much easier to follow.

> Now we will have 3 partitions and we do not want to delay 
> the partition which has the bare minimal acceptable number of nodes, 
> which differs depending on the number of nodes configured.

What happens if the remaining partition does not have the minimum number of 
nodes? How long will the delay be? Have you tested the scenario where a cluster 
has only "small" partitions left?

>     What happens if there is no larger partition, for instance if nodes
>     were taken down administratively
> 
> Node that is taken down for administrative function is no more a part of 
> the cluster. Hence there is no issue.

The documented procedure 
http://docs.sun.com/app/docs/doc/819-2971/z4000076997776?l=en&a=view
is to evacuate a node for maintenance. Unconfiguring it would be too much of  
burden for the admin, IMHO.

So, unless I don't know about new functionality, there is no state information 
available which marks a cluster node as "not available". In the partitioning 
scenario, we must assume a node is offline if we cannot communicate with it.

So in short, IMHO there is currently no practical way to reliably determine the 
total cluster size in a partitioning situation.

Am I wrong? I'd be glad if I was and if I am, please help me understand.

If I am right, it could help to add a node property indicating whether or not 
the node is available. IMHO, this would also help administrators in handling 
defective hardware, test scenarios, node-local s/w issues etc.

> What is implicit 
> is that most clusters are of 4 node, hence the logic works for most of 
> the cases.

I disagree with using such an assumption as the basis for a particular 
implementation. Might be that Sun has statistics internally about cluster sizes 
deployed in the field, but as long as the product is supported for other sizes 
as well, it should work well for all of them.

> This is a heuristic algorithm that I am trying to apply. Hence as any 
> heuristic algorithm, it tries to solve the problem by coming very close 
> to the best possible solution.

I do agree with this approach, and I understand that your change will improve a 
particular scenario. I am only worried that it could have negative effects in 
other scenarios. As I have seen enough of those in the past, so I'd be grateful 
if you could clarify my remaining questions.

Nils
--

This message posted from opensolaris.org


Reply via email to