[
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536915#comment-14536915
]
Bikas Saha commented on TEZ-2421:
---------------------------------
bq. I look at the jstack trace, not sure where's the deadlock. App Shared Pool
- #1 try to acquire VertexImpl's writelock and no other thread has the
writeblock except some thread also try to acquire the readlock
Thread 1 has V1 readlock acquired and tries to acquire readlock on V2. Thread 2
wants to acquire writelock on V1 and is blocked because thread 1 has the
readlock. Thread 3 has writelock on V2 and is trying to acquire readlock on V1
which is blocked due to the pending writelock on Thread 2. Thus the 3 threads
have locked each other out. This will repro when TestAMRecovery is run in a
loop or by running a large job with (specially with 1-1 edges) in a cluster in
a loop.
Attaching a patch that fixes the locking issues. Verified by running test
AMRecovery etc. in a loop and a large job in the cluster in a loop.
> Deadlock in AM because attempt and vertex locking each other out
> ----------------------------------------------------------------
>
> Key: TEZ-2421
> URL: https://issues.apache.org/jira/browse/TEZ-2421
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Priority: Blocker
> Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch
>
>
> Ideally locks should be taken one way - either going down or up. Preferably
> not going up because most such data can be passed in during object
> construction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)