>>
For example, ordering on latency:
- nodes on one host = 1
- nodes in one rack-blade = 2
- nodes in one server-rack = 3
- nodes in one physical cluster = 4
- nodes in one subnet = 5
- etc.

Maybe it'll be better to use some metrics from ClusterMetrics interface.

The algorithm of ordering can be implemented in a class such as Comparator
and use it when we build a cluster or we select a place for a new node.
>>

Vyacheslav, please elaborate on how we can determine whether we are on the
same rack. I am not sure this is possible in general case. Please see my
suggestions below.

>>
However, here is the concern I have. Currently when a new node joins,
coordinator assigns order number to this node (e.g. if we already have
nodes 1,2 and 3, new node will have order 4). This node will then be the
last one on the ring, i.e. nodes are always ordered in the ring by this
order number (1->2->3->4->1). If we change this, we will basically allow a
node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
sure if this is going to cause issues, but sounds dangerous.

Yakov, can you please chime in and share your thoughts on this?
>>

I don't think this may cause issues. Nodes ordering and placement is
implemented in TcpDiscoveryNodesRing and I think that we will just need to
alter 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection<org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode>)
logic.

As far as design of this, I would suggest the following.

1.  User should have an ability to define ARC_ID for the node. I suggest
"arc" for this since we are using "ring" concept. This will be the most
honored characteristic for nodes placement. By default arc_id is 0 and
possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
via TcpDiscoverySpi.setArcId() - new method.
So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
represent different racks or data centers.

I am strongly against giving user an opportunity to point exact place in
the ring with somewhat like this interface [int getIdex(Node newNode,
List<Node> currentRing)]. This is very error prone and may require tricky
consistency checks just to make sure that implementation of this interface
is consistent along the topology.
With "arcs" approach user can automatically assign proper ids basing on
physical network topology and network routes.

2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
placed side by side in the same arc.

3. Physical host - 3rd honored parameter. Nodes on the same physical host
should be placed together automatically in the same arc.

4. New mode involving points 1-3 should become default and we should also
provide ability to switch to current mode which should become legacy.

--Yakov

Reply via email to