[ 
https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583765#comment-16583765
 ] 

Dmitriy Pavlov commented on IGNITE-5569:
----------------------------------------

[~dkarachentsev] tests seem to be more or less OK, but still, I've retriggered 
failed suites.

Could you please cover review checklist items and share it as JIRA comment.

https://lists.apache.org/thread.html/3196274d0be41ebd722536542914a0d86bab9d6764d14217681dedb3@%3Cdev.ignite.apache.org%3E

See  https://cwiki.apache.org/confluence/display/IGNITE/Review+Checklist

E.g. comment can be as follows:
1.a API compatibility MUST be maintained 
No API changes
1.b Default behavior SHOULD NOT be changed
Default behavior was not changed.
etc.

Then we can ask Yakov to review.

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a 
> cluster DDoS
> -------------------------------------------------------------------------------------
>
>                 Key: IGNITE-5569
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5569
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.7
>            Reporter: Alexey Goncharuk
>            Assignee: Dmitry Karachentsev
>            Priority: Major
>             Fix For: 2.7
>
>
> A firewall configuration issue may effectively lead to a cluster DDoS. The 
> scheme is as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming 
> connection from cluster to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery 
> NODE_JOINED event (not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and 
> generates NODE_FAILED event
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining 
> node failed / did not join. A slight hint (failed to send message to the next 
> node) is printed on the node with the largest order (the one that attempted 
> to close the ring), but the root cause (connection refused) is also not 
> printed
> 2) The joining node attempts to connect to the cluster with the same node ID. 
> This violates an invariant we heavily rely on that once a node ID leaves a 
> cluster, this ID never comes back again
> 3) Each discovery event leads to a partition exchange which blocks all cache 
> operations for a time interval equal at least to the full ring latency time. 
> If several nodes are started on a malicious host, this may lead to almost 
> full cluster degradation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to