[
https://issues.apache.org/jira/browse/IGNITE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Goncharuk updated IGNITE-12523:
--------------------------------------
Ignite Flags: Release Notes Required (was: Docs Required,Release Notes
Required)
> Continuously generated thread dumps in failure processor slow down the whole
> system
> -----------------------------------------------------------------------------------
>
> Key: IGNITE-12523
> URL: https://issues.apache.org/jira/browse/IGNITE-12523
> Project: Ignite
> Issue Type: Improvement
> Reporter: Andrey N. Gura
> Assignee: Andrey N. Gura
> Priority: Major
> Fix For: 2.9
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> A lot of threads (hundreds) build indexes. checkpoint-thread tries acquire
> write lock but can’t because some threads hold read lock. Moreover, some
> threads try to acquire read lock too. Failure types SYSTEM_WORKER_BLOCKED and
> SYSTEM_CRITICAL_OPERATION_TIMEOUT are ignored.
> checkpoint-thread treated as blocked critical system worker. So failure
> processor gets thread dump.
> Threads that waiting on read lock reports about
> SYSTEM_CRITICAL_OPERATION_TIMEOUT and also get thread dump.
> Thread dump generation takes from 500 to 1000 ms.
> All this activity leads to stop-the-world pause and triggers other timeouts.
> It could take long time because many threads are active and half time is
> thread dump generation.
> Root cause problem here is checkpoint read-write lock. Discussed with
> [~agoncharuk] and it seems only implementation of fuzzy checkpoint could
> solve the problem. But it requires big effort.
> *Solution*
> - New system property IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT
> added. Default value is failure detection timeout.
> - Each call of FailureProcessor#process(FailureContext, FailureHandler)
> method checka throttling timeout before thread dump generation.
> - There is no need to check that failure type is ignored. Throttling will be
> useful for all cases when context is not invalidated
> (FailureProcessor.failureCtx != null).
> - For throttled thread dump we log info message “Thread dump is hidden due
> to throttling settings. Set IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT
> property to 0 to see all thread dumps".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)