[jira] [Updated] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

Alexey Goncharuk (JIRA) Thu, 22 Jun 2017 02:26:15 -0700

     [ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Goncharuk updated IGNITE-5569:
-------------------------------------
    Description: 
A firewall configuration issue may effectively lead to a cluster DDoS. The 
scheme is as follows:

1) A node G joins the cluster, and a firewall rule forbids incoming connection 
from cluster to this node
2) Cluster successfully processes NodeAddedMesage and fires a discovery 
NODE_JOINED event (not sure why?)
4) The last node in the ring fails to connect to the newly joined node and 
generates NODE_FAILED event
5) Coordinator drops the connection, joining node attempts to connect again

The issues I see here:
1) Neither coordinator nor joining node print out the reason why the joining 
node failed / did not join. A slight hint (failed to send message to the next 
node) is printed on the node with the largest order (the one that attempted to 
close the ring), but the root cause (connection refused) is also not printed
2) The joining node attempts to connect to the cluster with the same node ID. 
This violates an invariant we heavily rely on that once a node ID leaves a 
cluster, this ID never comes back again
3) Each discovery event leads to a partition exchange which blocks all cache 
operations for a time interval equal at least to the full ring latency time. If 
several nodes are started on a malicious host, this may lead to almost full 
cluster degradation

  was:
A firewall configuration issue may effectively lead to a cluster DDoS. The 
scheme is as follows:

1) A node G joins the cluster, and a firewall rule forbids incoming connection 
from cluster to this node
2) Cluster successfully processes NodeAddedMesage (not sure why) and sends 
NodeAddFinishedMessage
3) Each node receives NodeAddFinishedMessage and fires a discovery NODE_JOINED 
event
4) The last node in the ring fails to connect to the newly joined node and 
generates NODE_FAILED event
5) Coordinator drops the connection, joining node attempts to connect again

The issues I see here:
1) Neither coordinator nor joining node print out the reason why the joining 
node failed / did not join. A slight hint (failed to send message to the next 
node) is printed on the node with the largest order (the one that attempted to 
close the ring), but the root cause (connection refused) is also not printed
2) The joining node attempts to connect to the cluster with the same node ID. 
This violates an invariant we heavily rely on that once a node ID leaves a 
cluster, this ID never comes back again
3) Each discovery event leads to a partition exchange which blocks all cache 
operations for a time interval equal at least to the full ring latency time. If 
several nodes are started on a malicious host, this may lead to almost full 
cluster degradation


> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -------------------------------------------------------------------------------------
>
>                 Key: IGNITE-5569
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5569
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.7
>            Reporter: Alexey Goncharuk
>             Fix For: 2.1
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS

Reply via email to