Hi Tirthankar, > let_partition_wait is to return true if the node running it is in > the "smaller" partition, right? > > Yes where the definition of "smaller" changes according to the number of > nodes configured. i.e. if n is the number of nodes configured, a smaller > partition may be much less than n/2. > [...] > In your code, the definition of "large" is fixed. In my code, the > definition of "small" is variable.
Agree. But I would prefer a closed form as the definition of what is considered a small/large partition. The numbers you have chosen seem arbitrary to a certain extent and I believe it would be hard to show that exactly those numbers are a good choice. If you could come up with a formula and some good reasoning behind it, this should be much easier to follow. > Now we will have 3 partitions and we do not want to delay > the partition which has the bare minimal acceptable number of nodes, > which differs depending on the number of nodes configured. What happens if the remaining partition does not have the minimum number of nodes? How long will the delay be? Have you tested the scenario where a cluster has only "small" partitions left? > What happens if there is no larger partition, for instance if nodes > were taken down administratively > > Node that is taken down for administrative function is no more a part of > the cluster. Hence there is no issue. The documented procedure http://docs.sun.com/app/docs/doc/819-2971/z4000076997776?l=en&a=view is to evacuate a node for maintenance. Unconfiguring it would be too much of burden for the admin, IMHO. So, unless I don't know about new functionality, there is no state information available which marks a cluster node as "not available". In the partitioning scenario, we must assume a node is offline if we cannot communicate with it. So in short, IMHO there is currently no practical way to reliably determine the total cluster size in a partitioning situation. Am I wrong? I'd be glad if I was and if I am, please help me understand. If I am right, it could help to add a node property indicating whether or not the node is available. IMHO, this would also help administrators in handling defective hardware, test scenarios, node-local s/w issues etc. > What is implicit > is that most clusters are of 4 node, hence the logic works for most of > the cases. I disagree with using such an assumption as the basis for a particular implementation. Might be that Sun has statistics internally about cluster sizes deployed in the field, but as long as the product is supported for other sizes as well, it should work well for all of them. > This is a heuristic algorithm that I am trying to apply. Hence as any > heuristic algorithm, it tries to solve the problem by coming very close > to the best possible solution. I do agree with this approach, and I understand that your change will improve a particular scenario. I am only worried that it could have negative effects in other scenarios. As I have seen enough of those in the past, so I'd be grateful if you could clarify my remaining questions. Nils -- This message posted from opensolaris.org
