[jira] [Updated] (IGNITE-12499) Node took a long time to start after kill

Ivan Bessonov (Jira) Fri, 07 Feb 2020 06:14:49 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-12499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Bessonov updated IGNITE-12499:
-----------------------------------
    Description: 
Test scenario:
 1) Start 4 node cluster
 2) Activate
 3) Load 1k rows to each cache
 4) Stop node
 5) Return it back without index.bin files
 6) Wait until start

Somehow the first node takes Waiting for topology snapshot: server(s) 4/4, 
client(s) 0/*, timeout 1166/1800 sec to start.

[10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool" 
(129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86 
ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16 
ms),s
 tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224 
ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938 
ms),stage="Join topology" (6024 ms),stage="Await transition" (16 
ms),stage="Await e
 xchange" (14855 ms),stage="Total time" (1157973 ms)]
h3. Clarification:

"Restore logical state" stage is the longest one and it uses one thread, so 
CPU/IO utilization is very low. Execution of "restorePartitionStates" in some 
ExecutorService would drastically speed up the whole node startup process 
because it's the main reason of restore being slow in this particular case.

  was:
Test scenario:
 1) Start 4 node cluster
 2) Activate
 3) Load 1k rows to each cache
 4) Stop node
 5) Return it back without index.bin files
 6) Wait until start

Somehow the first node takes Waiting for topology snapshot: server(s) 4/4, 
client(s) 0/*, timeout 1166/1800 sec to start.

[10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool" 
(129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86 
ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16 
ms),s
 tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224 
ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938 
ms),stage="Join topology" (6024 ms),stage="Await transition" (16 
ms),stage="Await e
 xchange" (14855 ms),stage="Total time" (1157973 ms)]
h3. Clarification:

"Restore logical state" stage is the longest one and it uses one thread, so 
CPU/IO utilization is very low. Execution of the same operation in some 
ExecutorService would drastically speed up the whole node startup process.


> Node took a long time to start after kill
> -----------------------------------------
>
>                 Key: IGNITE-12499
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12499
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Bessonov
>            Assignee: Ivan Bessonov
>            Priority: Major
>             Fix For: 2.9
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Test scenario:
>  1) Start 4 node cluster
>  2) Activate
>  3) Load 1k rows to each cache
>  4) Stop node
>  5) Return it back without index.bin files
>  6) Wait until start
> Somehow the first node takes Waiting for topology snapshot: server(s) 4/4, 
> client(s) 0/*, timeout 1166/1800 sec to start.
> [10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool" 
> (129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" 
> (86 ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" 
> (16 ms),s
>  tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224 
> ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938 
> ms),stage="Join topology" (6024 ms),stage="Await transition" (16 
> ms),stage="Await e
>  xchange" (14855 ms),stage="Total time" (1157973 ms)]
> h3. Clarification:
> "Restore logical state" stage is the longest one and it uses one thread, so 
> CPU/IO utilization is very low. Execution of "restorePartitionStates" in some 
> ExecutorService would drastically speed up the whole node startup process 
> because it's the main reason of restore being slow in this particular case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IGNITE-12499) Node took a long time to start after kill

Reply via email to