On 10/31/13 11:20 PM, Sanne Grinovero wrote: > On 31 October 2013 20:07, Mircea Markus <[email protected]> wrote: >> >> On Oct 31, 2013, at 3:45 PM, Dennis Reed <[email protected]> wrote: >> >>> On 10/31/2013 02:18 AM, Bela Ban wrote: >>>> >>>>> Also if we did have read only, what criteria would cause those nodes >>>>> to be writeable again? >>>> Once you become the primary partition, e.g. when a view is received >>>> where view.size() >= N where N is a predefined threshold. Can be >>>> different, as long as it is deterministic. >>>> >>>>> There is no guarantee when the other nodes >>>>> will ever come back up or if there will ever be additional ones anytime >>>>> soon. >>>> If a system picks the Primary Partition approach, then it can become >>>> completely inaccessible (read-only). In this case, I envisage that a >>>> sysadmin will be notified, who can then start additional nodes for the >>>> system to acquire primary partition and become accessible again. >>> >>> There should be a way to manually modify the primary partition status. >>> So if the admin knows the nodes will never return, they can manually >>> enable the partition. >> >> The status will be exposed through JMX at any point, disregarding if there's >> a split brain going on or not. >> >>> >>> Also, the PartitionContext should know whether the nodes left normally >>> or not. >>> If you have 5 nodes in a cluster, and you shut down 3 of them, you'll >>> want the remaining two to remain available. >>> But if there was a network partition, you wouldn't. So it needs to know >>> the difference. >> >> very good point again. >> Thank you Dennis! > > Let's clarify. If 3 nodes out of 5 are killed without a > reconfiguration, you do NOT want the remaining two to remain available > unless explicitly told so by an admin. It is not possible to > automatically make a distinction between 3 nodes being shut down vs. 3 > crashed nodes.
We could determine that a node left *gracefully* by sending an RPC before leaving. But for all other cases, we don't know whether a node got partitioned away, or whether it crashed. For the graceful-leave case, we could say that we can go below the read-only threshold to remain available. This would increase overall availability a bit. > In our face to face meeting we did point out that an admin needs hooks > to be able to: > - specify how many nodes are expected in the full system (and adapt > dynamically) > - some admin command to "clean shutdown" a node (which was also > discussed as a strong requirement in scope of CacheStores so I'm > assuming the operation is defined already) > > The design Wiki has captured the API we discussed around the > PartitionHandlingStrategy but is missing the details about these > operations, that should probably be added to the PartitionContext as > well. > > Also in the scope of CacheStore consistency we had discussed the need > to store the expected nodes to be in the View: for example when the > grid is started and all nodes are finding each other, the Cache shall > not be considered started until all required nodes have joined. -- Bela Ban, JGroups lead (http://www.jgroups.org) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
