[
https://issues.apache.org/jira/browse/IGNITE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladislav Pyatkov updated IGNITE-4491:
--------------------------------------
Description:
Reproduction steps:
1) Start nodes:
{noformat}
DC1 DC2
1 (10.116.172.1) 8 (10.116.64.11)
2 (10.116.172.2) 7 (10.116.64.12)
3 (10.116.172.3) 6 (10.116.64.13)
4 (10.116.172.4) 5 (10.116.64.14)
{noformat}
each node have client which run in same host with server (look source in
attachment).
2) Drop connection
Between 1-8,
{noformat}
1 (10.116.172.1) 8 (10.116.64.11)
{noformat}
Drop all input and output traffic
Invoke from 10.116.172.1
{code}
iptables -A INPUT -s 10.116.64.11 -j DROP
iptables -A OUTPUT -d 10.116.64.11 -j DROP
{code}
Between 4-5
{noformat}
4 (10.116.172.4) 5 (10.116.64.14)
{noformat}
Invoke from 10.116.172.4
{code}
iptables -A INPUT -s 10.116.64.14 -j DROP
iptables -A OUTPUT -d 10.116.64.14 -j DROP
{code}
3) Stop the grid, after several seconds
If you are looking into logs, you can find which node was segmented (pay
attention, which clients did not segmented.), after drop traffic:
{noformat}
[12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager]
Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]
{noformat}
And all operations stopped at the same time.
was:
Reproduction steps:
1) Start nodes:
{noformat}
DC1 DC2
1 (10.116.172.1) 8 (10.116.64.11)
2 (10.116.172.2) 7 (10.116.64.12)
3 (10.116.172.3) 6 (10.116.64.13)
4 (10.116.172.4) 5 (10.116.64.14)
{noformat}
each node have client which run in same host with server (look source in
attachment).
2) Drop connection
Between 1-8,
{noformat}
1 (10.116.172.1) 8 (10.116.64.11)
{noformat}
Drop all input and output traffic
Invoke from 10.116.172.1
{noformat}
iptables -A INPUT -s 10.116.64.11 -j DROP
iptables -A OUTPUT -d 10.116.64.11 -j DROP
{noformat}
Between 4-5
{noformat}
4 (10.116.172.4) 5 (10.116.64.14)
{noformat}
Invoke from 10.116.172.4
{noformat}
iptables -A INPUT -s 10.116.64.14 -j DROP
iptables -A OUTPUT -d 10.116.64.14 -j DROP
{noformat}
3) Stop the grid, after several seconds
If you are looking into logs, you can find which node was segmented (pay
attention, which clients did not segmented.), after drop traffic:
{noformat}
[12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager]
Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]
{noformat}
And all operations stopped at the same time.
> Commutation loss between two nodes leads to hang whole cluster.
> ---------------------------------------------------------------
>
> Key: IGNITE-4491
> URL: https://issues.apache.org/jira/browse/IGNITE-4491
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 1.8
> Reporter: Vladislav Pyatkov
> Priority: Critical
> Attachments: Segmentation.7z
>
>
> Reproduction steps:
> 1) Start nodes:
> {noformat}
> DC1 DC2
> 1 (10.116.172.1) 8 (10.116.64.11)
> 2 (10.116.172.2) 7 (10.116.64.12)
> 3 (10.116.172.3) 6 (10.116.64.13)
> 4 (10.116.172.4) 5 (10.116.64.14)
> {noformat}
> each node have client which run in same host with server (look source in
> attachment).
> 2) Drop connection
> Between 1-8,
> {noformat}
> 1 (10.116.172.1) 8 (10.116.64.11)
> {noformat}
> Drop all input and output traffic
> Invoke from 10.116.172.1
> {code}
> iptables -A INPUT -s 10.116.64.11 -j DROP
> iptables -A OUTPUT -d 10.116.64.11 -j DROP
> {code}
> Between 4-5
> {noformat}
> 4 (10.116.172.4) 5 (10.116.64.14)
> {noformat}
> Invoke from 10.116.172.4
> {code}
> iptables -A INPUT -s 10.116.64.14 -j DROP
> iptables -A OUTPUT -d 10.116.64.14 -j DROP
> {code}
> 3) Stop the grid, after several seconds
> If you are looking into logs, you can find which node was segmented (pay
> attention, which clients did not segmented.), after drop traffic:
> {noformat}
> [12:04:33,914][INFO][disco-event-worker-#211%null%][GridDiscoveryManager]
> Topology snapshot [ver=18, servers=6, clients=8, CPUs=456, heap=68.0GB]
> {noformat}
> And all operations stopped at the same time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)