>> For example, ordering on latency: - nodes on one host = 1 - nodes in one rack-blade = 2 - nodes in one server-rack = 3 - nodes in one physical cluster = 4 - nodes in one subnet = 5 - etc.
Maybe it'll be better to use some metrics from ClusterMetrics interface. The algorithm of ordering can be implemented in a class such as Comparator and use it when we build a cluster or we select a place for a new node. >> Vyacheslav, please elaborate on how we can determine whether we are on the same rack. I am not sure this is possible in general case. Please see my suggestions below. >> However, here is the concern I have. Currently when a new node joins, coordinator assigns order number to this node (e.g. if we already have nodes 1,2 and 3, new node will have order 4). This node will then be the last one on the ring, i.e. nodes are always ordered in the ring by this order number (1->2->3->4->1). If we change this, we will basically allow a node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100% sure if this is going to cause issues, but sounds dangerous. Yakov, can you please chime in and share your thoughts on this? >> I don't think this may cause issues. Nodes ordering and placement is implemented in TcpDiscoveryNodesRing and I think that we will just need to alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection<org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode>) logic. As far as design of this, I would suggest the following. 1. User should have an ability to define ARC_ID for the node. I suggest "arc" for this since we are using "ring" concept. This will be the most honored characteristic for nodes placement. By default arc_id is 0 and possible to set with system property IGNITE_DISCO_ARC_ID or env variable or via TcpDiscoverySpi.setArcId() - new method. So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can represent different racks or data centers. I am strongly against giving user an opportunity to point exact place in the ring with somewhat like this interface [int getIdex(Node newNode, List<Node> currentRing)]. This is very error prone and may require tricky consistency checks just to make sure that implementation of this interface is consistent along the topology. With "arcs" approach user can automatically assign proper ids basing on physical network topology and network routes. 2. Subnet - 2nd honored parameter. Nodes on the same subnet should be placed side by side in the same arc. 3. Physical host - 3rd honored parameter. Nodes on the same physical host should be placed together automatically in the same arc. 4. New mode involving points 1-3 should become default and we should also provide ability to switch to current mode which should become legacy. --Yakov