[
https://issues.apache.org/jira/browse/IGNITE-13564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Daschinskiy updated IGNITE-13564:
--------------------------------------
Description:
Currently, reporting of system thread blocking has major drawbacks.
1. As system worker blocking is detected by another thread, due to
implementation, failure handler receives not full information about problem. In
{{FailureContext}} we have only two fields -- {{type}} and {{err}}. Throwable
{{err}} is generated in thread-detector flow, so we lost a context of main
problem.
2. Currently, due to implementation, we print not full stacktrace of blocking
thread in {{org.apache.ignite.internal.worker.WorkersRegistry#onIdle}}.
3. Current approach doesn't work when there is one thread in registry, this
fact isn't checked and this can cause an infinite looping of single thread,
calling {{onIdle}} Fixed in
This two drawbacks can lead to completely loss of information about blocking
system thread.
I suggests:
1. Add another parameter in {{FailureContext}}, namely {{worker}}
2. Fix threaddump printing.
3. Add assertion when there is only one system thread in registry
was:
Currently, reporting of system thread blocking has major drawbacks.
1. As system worker blocking is detected by another thread, due to
implementation, failure handler receives not full information about problem. In
{{FailureContext}} we have only two fields -- {{type}} and {{err}}. Throwable
{{err}} is generated in thread-detector flow, so we lost a context of main
problem.
2. Currently, due to implementation, we print not full stacktrace of blocking
thread in {{org.apache.ignite.internal.worker.WorkersRegistry#onIdle}}.
3. Current approach doesn't work when there is one thread in registry, this
fact isn't checked and this can cause an infinite looping of single thread,
calling {{onIdle}}
This two drawbacks can lead to completely loss of information about blocking
system thread.
I suggests:
1. Add another parameter in {{FailureContext}}, namely {{worker}}
2. Fix threaddump printing.
3. Add assertion when there is only one system thread in registry
> Improve SYSTEM_WORKER_BLOCKED reporting.
> ----------------------------------------
>
> Key: IGNITE-13564
> URL: https://issues.apache.org/jira/browse/IGNITE-13564
> Project: Ignite
> Issue Type: Improvement
> Affects Versions: 2.9, 2.8.1
> Reporter: Ivan Daschinskiy
> Priority: Major
> Fix For: 2.10
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Currently, reporting of system thread blocking has major drawbacks.
> 1. As system worker blocking is detected by another thread, due to
> implementation, failure handler receives not full information about problem.
> In {{FailureContext}} we have only two fields -- {{type}} and {{err}}.
> Throwable {{err}} is generated in thread-detector flow, so we lost a context
> of main problem.
> 2. Currently, due to implementation, we print not full stacktrace of blocking
> thread in {{org.apache.ignite.internal.worker.WorkersRegistry#onIdle}}.
> 3. Current approach doesn't work when there is one thread in registry, this
> fact isn't checked and this can cause an infinite looping of single thread,
> calling {{onIdle}} Fixed in
> This two drawbacks can lead to completely loss of information about blocking
> system thread.
> I suggests:
> 1. Add another parameter in {{FailureContext}}, namely {{worker}}
> 2. Fix threaddump printing.
> 3. Add assertion when there is only one system thread in registry
--
This message was sent by Atlassian Jira
(v8.3.4#803005)