[
https://issues.apache.org/jira/browse/IGNITE-12499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Bessonov updated IGNITE-12499:
-----------------------------------
Description:
Test scenario:
1) Start 4 node cluster
2) Activate
3) Load 1k rows to each cache
4) Stop node
5) Return it back without index.bin files
6) Wait until start
Somehow the first node takes Waiting for topology snapshot: server(s) 4/4,
client(s) 0/*, timeout 1166/1800 sec to start.
[10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool"
(129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86
ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16
ms),s
tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224
ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938
ms),stage="Join topology" (6024 ms),stage="Await transition" (16
ms),stage="Await e
xchange" (14855 ms),stage="Total time" (1157973 ms)]
h3. Clarification:
"Restore logical state" stage is the longest one and it uses one thread, so
CPU/IO utilization is very low. Execution of "restorePartitionStates" in some
ExecutorService would drastically speed up the whole node startup process
because it's the main reason of restore being slow in this particular case.
was:
Test scenario:
1) Start 4 node cluster
2) Activate
3) Load 1k rows to each cache
4) Stop node
5) Return it back without index.bin files
6) Wait until start
Somehow the first node takes Waiting for topology snapshot: server(s) 4/4,
client(s) 0/*, timeout 1166/1800 sec to start.
[10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool"
(129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86
ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16
ms),s
tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224
ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938
ms),stage="Join topology" (6024 ms),stage="Await transition" (16
ms),stage="Await e
xchange" (14855 ms),stage="Total time" (1157973 ms)]
h3. Clarification:
"Restore logical state" stage is the longest one and it uses one thread, so
CPU/IO utilization is very low. Execution of the same operation in some
ExecutorService would drastically speed up the whole node startup process.
> Node took a long time to start after kill
> -----------------------------------------
>
> Key: IGNITE-12499
> URL: https://issues.apache.org/jira/browse/IGNITE-12499
> Project: Ignite
> Issue Type: Bug
> Reporter: Ivan Bessonov
> Assignee: Ivan Bessonov
> Priority: Major
> Fix For: 2.9
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Test scenario:
> 1) Start 4 node cluster
> 2) Activate
> 3) Load 1k rows to each cache
> 4) Stop node
> 5) Return it back without index.bin files
> 6) Wait until start
> Somehow the first node takes Waiting for topology snapshot: server(s) 4/4,
> client(s) 0/*, timeout 1166/1800 sec to start.
> [10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool"
> (129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata"
> (86 ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin"
> (16 ms),s
> tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224
> ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938
> ms),stage="Join topology" (6024 ms),stage="Await transition" (16
> ms),stage="Await e
> xchange" (14855 ms),stage="Total time" (1157973 ms)]
> h3. Clarification:
> "Restore logical state" stage is the longest one and it uses one thread, so
> CPU/IO utilization is very low. Execution of "restorePartitionStates" in some
> ExecutorService would drastically speed up the whole node startup process
> because it's the main reason of restore being slow in this particular case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)