[ 
https://issues.apache.org/jira/browse/IGNITE-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Vinogradov updated IGNITE-6940:
-------------------------------------
    Description: 
Ignite Thread Pools Starvation

Description
This situation can occur if user submits tasks that recursively submit more 
tasks and synchronously wait for results. Jobs arrive to worker nodes and are 
queued forever since there are no free threads in public pool since all threads 
are waiting for job results.

Detection and Solution
Task timeout can be set for tasks, so task gets canceled automatically.
Web Console should provide ability to cancel any task and job from UI.

Report
Timed out tasks and jobs should be reported on Web Console and reported to 
logs. We need to introduce new config property to set timeout for reported jobs.
Log record and Web Console should include:
- Master node ID
- Start time

  was:There is an existing code in {{IgniteKernal.start()}} that logs warnings 
when detects starvation. It should be improved to support more thread pools and 
update some metrics.


> Thread Starvation monitoring
> ----------------------------
>
>                 Key: IGNITE-6940
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6940
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Andrey Kuznetsov
>            Assignee: Andrey Kuznetsov
>              Labels: iep-7
>
> Ignite Thread Pools Starvation
> Description
> This situation can occur if user submits tasks that recursively submit more 
> tasks and synchronously wait for results. Jobs arrive to worker nodes and are 
> queued forever since there are no free threads in public pool since all 
> threads are waiting for job results.
> Detection and Solution
> Task timeout can be set for tasks, so task gets canceled automatically.
> Web Console should provide ability to cancel any task and job from UI.
> Report
> Timed out tasks and jobs should be reported on Web Console and reported to 
> logs. We need to introduce new config property to set timeout for reported 
> jobs.
> Log record and Web Console should include:
> - Master node ID
> - Start time



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to