[ 
https://issues.apache.org/jira/browse/TEZ-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated TEZ-4251:
--------------------------------
    Attachment: TEZ-4251.1.patch

> Acquiring locks for getInputVertices and getOutputVertices is not consistent
> ----------------------------------------------------------------------------
>
>                 Key: TEZ-4251
>                 URL: https://issues.apache.org/jira/browse/TEZ-4251
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Priority: Major
>         Attachments: TEZ-4251.1.patch, container_jstack.txt
>
>
> *VertexImpl.getInputVertices()* acquires read lock however 
> *VertexImpl.getOutputVertices()* doesn't.
> We also faced with deadlock when using Tez from Hive: see 
> [^container_jstack.txt]
> 0. Both LlapTaskSchedulerService and VertexImpl defines its own 
> ReentrantReadWriteLock instance.
>  1. Thread "LlapScheduler" acquired write lock on 
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.java
>   protected void schedulePendingTasks() throws InterruptedException {
>     Ref<TaskInfo> downgradedTask = new Ref<>(null);
>     writeLock.lock();
> {code}
> 2. Thread "Dispatcher thread \{Central}" acquired write lock on 
> VertexImpl.lock
> {code:java}
> VertexImpl.java
>   public void handle(VertexEvent event) {
> ...
>     try {
>       writeLock.lock();
> {code}
> 3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
> {code:java}
> VertexImpl.java
>   @Override
>   public Map<Vertex, Edge> getInputVertices() {
>     readLock.lock();
> {code}
> but it is waiting because Thread "Dispatcher thread \{Central}" holds the 
> write lock on VertexImpl.lock
> 4. Thread "Dispatcher thread \{Central}" try acquire read lock on 
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.vaja
>   @Override
>   public Resource getTotalResources() {
> ...
>     readLock.lock();
> {code}
> but it is waiting because Thread "LlapScheduler" holds the write lock on 
> LlapTaskSchedulerService.lock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to