[
https://issues.apache.org/jira/browse/YUNIKORN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834103#comment-17834103
]
Craig Condit commented on YUNIKORN-2539:
----------------------------------------
Core PR is done, Shim is TODO. Looks like the code impact isn't too bad, was
able to make it a drop-in replacement for sync.\{RW}Mutex. Performance when
disabled is pretty minimal; on the order of 4ns / lock on my Intel laptop. It's
quite a bit slower when turned on, but that's to be expected. The
out-of-the-box experience should be good enough to leave it compiled in. I've
set it up so that env vars have to be set to turn it on as it's not safe to
activate once locks have started to be acquired.
Memory impact when activated should be minimal; only stack dumps for active
locks have to be held. There could be some GC impact however. The hope is that
this can be enabled as a last-ditch effort to troubleshoot deadlocks that don't
lend themselves to simple stack dump analysis.
> Add optional deadlock detection
> -------------------------------
>
> Key: YUNIKORN-2539
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2539
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler, shim - kubernetes
> Reporter: Craig Condit
> Assignee: Craig Condit
> Priority: Major
> Labels: pull-request-available
>
> We make heavy use of sync.Mutex and sync.RWMutex in our code. Unfortunately,
> while these are very performant, they can lead to difficult-to-diagnose
> deadlocks.
> If we substitute our own locking routines, we can optionally enable deadlock
> detection. See [https://github.com/sasha-s/go-deadlock] for a possible
> solution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]