[
https://issues.apache.org/jira/browse/TEZ-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235358#comment-17235358
]
László Bodor commented on TEZ-4251:
-----------------------------------
[~jeagles]: could you please give contributor rights to [~kkasa] as well?
thanks in advance
> Acquiring locks for getInputVertices and getOutputVertices is not consistent
> ----------------------------------------------------------------------------
>
> Key: TEZ-4251
> URL: https://issues.apache.org/jira/browse/TEZ-4251
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Krisztian Kasa
> Priority: Major
> Attachments: TEZ-4251.1.patch, container_jstack.txt
>
>
> *VertexImpl.getInputVertices()* acquires read lock however
> *VertexImpl.getOutputVertices()* doesn't.
> We also faced with deadlock when using Tez from Hive: see
> [^container_jstack.txt]
> 0. Both LlapTaskSchedulerService and VertexImpl defines its own
> ReentrantReadWriteLock instance.
> 1. Thread "LlapScheduler" acquired write lock on
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.java
> protected void schedulePendingTasks() throws InterruptedException {
> Ref<TaskInfo> downgradedTask = new Ref<>(null);
> writeLock.lock();
> {code}
> 2. Thread "Dispatcher thread \{Central}" acquired write lock on
> VertexImpl.lock
> {code:java}
> VertexImpl.java
> public void handle(VertexEvent event) {
> ...
> try {
> writeLock.lock();
> {code}
> 3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
> {code:java}
> VertexImpl.java
> @Override
> public Map<Vertex, Edge> getInputVertices() {
> readLock.lock();
> {code}
> but it is waiting because Thread "Dispatcher thread \{Central}" holds the
> write lock on VertexImpl.lock
> 4. Thread "Dispatcher thread \{Central}" try acquire read lock on
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.vaja
> @Override
> public Resource getTotalResources() {
> ...
> readLock.lock();
> {code}
> but it is waiting because Thread "LlapScheduler" holds the write lock on
> LlapTaskSchedulerService.lock
--
This message was sent by Atlassian Jira
(v8.3.4#803005)