TBH I'm not understanding which problem this thread is about :) Surely network partitions are a problem, but there are many forms of "partition", and many different opinions of what an "acceptable" behaviour is that the grid should implement, which largely depend on assumptions the client application is making.
Since we seem to be discussing a case in which the minority group is expected to flip into read-only mode, could we step back and describe: - why this is an accepatble solution for some class of applications? - what kind of potential network failure we want to take compensating actions for? I'm not an expert on how people physically wire up single nodes, racks and rooms to allow for our virtual connections, but let's assume that all nodes are connected with a single "cable" between each other, or if concrete multiple cables are actually used, could we rely on system configuration to guarantee packets can find alternative routes if one wire is eaten by mice? It seems important to me to define what level of network failure we want to address, for example are we assuming we don't deal with cases in which nodes can talk to one group but not vice-versa? If the effect of a nework failure is a completely isolated group, can we assume Hot Rod clients can't reach them either? If the group is totally isolated, would it still need read-only (with the risk of outdated reads) or could the whole group just shutdown since it's not reachable by anyone anyway? That is making more assumptions, like that all produced state change goes via the network as well, not suited for example to driving an assembly chain in a manufacturing plant, but then again it might be safer to stop the production belt rather than going ahead without being able to perform fresh read operations. I'm just trying to make an example of entirely different class of requirements, not proposing any solution but it seems to me that, given the complexity of the problem, we'll always need to make some trade off and which trade off is acceptable depends on the problem. If we described a very specific problem, we can work to make sure Infinispan and JGroups have enough extension points and smart protocols to deal with it, but I don't think we can resolve this issue at a one-size-fits-all level. Sanne _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
