[
https://issues.apache.org/jira/browse/TEZ-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krisztian Kasa updated TEZ-4251:
--------------------------------
Description:
*VertexImpl.getInputVertices()* acquires read lock however
*VertexImpl.getOutputVertices()* doesn't.
We also faced with deadlock when using Tez from Hive: see
[^container_jstack.txt]
0. Both LlapTaskSchedulerService and VertexImpl defines its own
ReentrantReadWriteLock instance.
1. Thread "LlapScheduler" acquired write lock on LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.java
protected void schedulePendingTasks() throws InterruptedException {
Ref<TaskInfo> downgradedTask = new Ref<>(null);
writeLock.lock();
{code}
2. Thread "Dispatcher thread \{Central}" acquired write lock on VertexImpl.lock
{code:java}
VertexImpl.java
public void handle(VertexEvent event) {
...
try {
writeLock.lock();
{code}
3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
{code:java}
VertexImpl.java
@Override
public Map<Vertex, Edge> getInputVertices() {
readLock.lock();
{code}
but it is waiting because Thread "Dispatcher thread \{Central}" holds the write
lock on VertexImpl.lock
4. Thread "Dispatcher thread \{Central}" try acquire read lock on
LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.vaja
@Override
public Resource getTotalResources() {
...
readLock.lock();
{code}
but it is waiting because Thread "LlapScheduler" holds the write lock on
LlapTaskSchedulerService.lock
was:
*VertexImpl.getInputVertices()* acquires read lock however
*VertexImpl.getOutputVertices()* doesn't.
We also faced with deadlock when using Tez from Hive: see
[^container_jstack.txt]
0. Both LlapTaskSchedulerService and VertexImpl defines its own
ReentrantReadWriteLock instance.
LlapTaskSchedulerService
[https://github.com/hortonworks/hive/blob/55b48401d7d3354b259db46e5029670db91944bb/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java#L229]
VertexImpl
[https://github.com/hortonworks/tez/blob/e5a04482b726565efe51d8103476aa65e455245f/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L903]
1. Thread "LlapScheduler" acquired write lock on LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.java
protected void schedulePendingTasks() throws InterruptedException {
Ref<TaskInfo> downgradedTask = new Ref<>(null);
writeLock.lock();
{code}
2. Thread "Dispatcher thread \{Central}" acquired write lock on VertexImpl.lock
{code:java}
VertexImpl.java
public void handle(VertexEvent event) {
...
try {
writeLock.lock();
{code}
3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
{code:java}
VertexImpl.java
@Override
public Map<Vertex, Edge> getInputVertices() {
readLock.lock();
{code}
but it is waiting because Thread "Dispatcher thread \{Central}" holds the write
lock on VertexImpl.lock
4. Thread "Dispatcher thread \{Central}" try acquire read lock on
LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.vaja
@Override
public Resource getTotalResources() {
...
readLock.lock();
{code}
but it is waiting because Thread "LlapScheduler" holds the write lock on
LlapTaskSchedulerService.lock
> Acquiring locks for getInputVertices and getOutputVertices is not consistent
> ----------------------------------------------------------------------------
>
> Key: TEZ-4251
> URL: https://issues.apache.org/jira/browse/TEZ-4251
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Krisztian Kasa
> Priority: Major
> Attachments: container_jstack.txt
>
>
> *VertexImpl.getInputVertices()* acquires read lock however
> *VertexImpl.getOutputVertices()* doesn't.
> We also faced with deadlock when using Tez from Hive: see
> [^container_jstack.txt]
> 0. Both LlapTaskSchedulerService and VertexImpl defines its own
> ReentrantReadWriteLock instance.
> 1. Thread "LlapScheduler" acquired write lock on
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.java
> protected void schedulePendingTasks() throws InterruptedException {
> Ref<TaskInfo> downgradedTask = new Ref<>(null);
> writeLock.lock();
> {code}
> 2. Thread "Dispatcher thread \{Central}" acquired write lock on
> VertexImpl.lock
> {code:java}
> VertexImpl.java
> public void handle(VertexEvent event) {
> ...
> try {
> writeLock.lock();
> {code}
> 3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
> {code:java}
> VertexImpl.java
> @Override
> public Map<Vertex, Edge> getInputVertices() {
> readLock.lock();
> {code}
> but it is waiting because Thread "Dispatcher thread \{Central}" holds the
> write lock on VertexImpl.lock
> 4. Thread "Dispatcher thread \{Central}" try acquire read lock on
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.vaja
> @Override
> public Resource getTotalResources() {
> ...
> readLock.lock();
> {code}
> but it is waiting because Thread "LlapScheduler" holds the write lock on
> LlapTaskSchedulerService.lock
--
This message was sent by Atlassian Jira
(v8.3.4#803005)