[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076468#comment-17076468
 ] 

László Bodor commented on TEZ-3967:
-----------------------------------

[~gopalv], [~jeagles]: could you please take a look at  [^TEZ-3967.01.patch] 
it's basically about narrowing the scope of the read locking

1. removed read lock from getDAGStatus, as it doesn't contain sensitive 
operation I think (except counters, but it's handled in called getAllCounters 
method)
-> first I thought that iterating vertexMap's entryset could be dangerous 
without lock, but I don't expect ConcurrentModificationException as vertexMap 
is initialized initializeDAG and then the number of entries are not supposed to 
change

2. getAllCounters
removed readlock from the beginning of the method, as 
mayBeConstructFinalFullCounters is protected with a fullCountersLock itself + 
getInternalState method also has its own locking

do you think this patch can help getDAGStatus lock footprint? if so, let me 
know if you want me to do any performance/sanity check

> DAGImpl: dag lock is unfair and can starve the writers
> ------------------------------------------------------
>
>                 Key: TEZ-3967
>                 URL: https://issues.apache.org/jira/browse/TEZ-3967
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Gopal Vijayaraghavan
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: TEZ-3967.01.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
>     try {
>       writeLock.lock();
> {code}
> {code}
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
>         at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
>         at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
>         at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>        at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
>         at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
>         at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to