[ 
https://issues.apache.org/jira/browse/TEZ-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated TEZ-4251:
--------------------------------
    Description: 
*VertexImpl.getInputVertices()* acquires read lock however 
*VertexImpl.getOutputVertices()* doesn't.

We also faced with deadlock when using Tez from Hive: see 
[^container_jstack.txt]

0. Both LlapTaskSchedulerService and VertexImpl defines its own 
ReentrantReadWriteLock instance.
 1. Thread "LlapScheduler" acquired write lock on LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.java
  protected void schedulePendingTasks() throws InterruptedException {
    Ref<TaskInfo> downgradedTask = new Ref<>(null);
    writeLock.lock();
{code}
2. Thread "Dispatcher thread \{Central}" acquired write lock on VertexImpl.lock
{code:java}
VertexImpl.java
  public void handle(VertexEvent event) {
...
    try {
      writeLock.lock();
{code}
3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
{code:java}
VertexImpl.java
  @Override
  public Map<Vertex, Edge> getInputVertices() {
    readLock.lock();
{code}
but it is waiting because Thread "Dispatcher thread \{Central}" holds the write 
lock on VertexImpl.lock

4. Thread "Dispatcher thread \{Central}" try acquire read lock on 
LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.vaja
  @Override
  public Resource getTotalResources() {
...
    readLock.lock();
{code}
but it is waiting because Thread "LlapScheduler" holds the write lock on 
LlapTaskSchedulerService.lock

  was:
*VertexImpl.getInputVertices()* acquires read lock however 
*VertexImpl.getOutputVertices()* doesn't.

We also faced with deadlock when using Tez from Hive: see 
[^container_jstack.txt]

 0. Both LlapTaskSchedulerService and VertexImpl defines its own 
ReentrantReadWriteLock instance.
 LlapTaskSchedulerService
 
[https://github.com/hortonworks/hive/blob/55b48401d7d3354b259db46e5029670db91944bb/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java#L229]
 VertexImpl
 
[https://github.com/hortonworks/tez/blob/e5a04482b726565efe51d8103476aa65e455245f/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L903]

1. Thread "LlapScheduler" acquired write lock on LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.java
  protected void schedulePendingTasks() throws InterruptedException {
    Ref<TaskInfo> downgradedTask = new Ref<>(null);
    writeLock.lock();
{code}
2. Thread "Dispatcher thread \{Central}" acquired write lock on VertexImpl.lock
{code:java}
VertexImpl.java
  public void handle(VertexEvent event) {
...
    try {
      writeLock.lock();
{code}
3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
{code:java}
VertexImpl.java
  @Override
  public Map<Vertex, Edge> getInputVertices() {
    readLock.lock();
{code}
but it is waiting because Thread "Dispatcher thread \{Central}" holds the write 
lock on VertexImpl.lock


 4. Thread "Dispatcher thread \{Central}" try acquire read lock on 
LlapTaskSchedulerService.lock
{code:java}
LlapTaskSchedulerService.vaja
  @Override
  public Resource getTotalResources() {
...
    readLock.lock();
{code}
but it is waiting because Thread "LlapScheduler" holds the write lock on 
LlapTaskSchedulerService.lock


> Acquiring locks for getInputVertices and getOutputVertices is not consistent
> ----------------------------------------------------------------------------
>
>                 Key: TEZ-4251
>                 URL: https://issues.apache.org/jira/browse/TEZ-4251
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Priority: Major
>         Attachments: container_jstack.txt
>
>
> *VertexImpl.getInputVertices()* acquires read lock however 
> *VertexImpl.getOutputVertices()* doesn't.
> We also faced with deadlock when using Tez from Hive: see 
> [^container_jstack.txt]
> 0. Both LlapTaskSchedulerService and VertexImpl defines its own 
> ReentrantReadWriteLock instance.
>  1. Thread "LlapScheduler" acquired write lock on 
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.java
>   protected void schedulePendingTasks() throws InterruptedException {
>     Ref<TaskInfo> downgradedTask = new Ref<>(null);
>     writeLock.lock();
> {code}
> 2. Thread "Dispatcher thread \{Central}" acquired write lock on 
> VertexImpl.lock
> {code:java}
> VertexImpl.java
>   public void handle(VertexEvent event) {
> ...
>     try {
>       writeLock.lock();
> {code}
> 3. Thread "LlapScheduler" tries acquiring read lock on VertexImpl.lock
> {code:java}
> VertexImpl.java
>   @Override
>   public Map<Vertex, Edge> getInputVertices() {
>     readLock.lock();
> {code}
> but it is waiting because Thread "Dispatcher thread \{Central}" holds the 
> write lock on VertexImpl.lock
> 4. Thread "Dispatcher thread \{Central}" try acquire read lock on 
> LlapTaskSchedulerService.lock
> {code:java}
> LlapTaskSchedulerService.vaja
>   @Override
>   public Resource getTotalResources() {
> ...
>     readLock.lock();
> {code}
> but it is waiting because Thread "LlapScheduler" holds the write lock on 
> LlapTaskSchedulerService.lock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to