[
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076543#comment-17076543
]
Jonathan Turner Eagles commented on TEZ-3967:
---------------------------------------------
Perhaps, this fails to address the issue at hand. For the getAllCounters case,
by moving the getInternalState outside of the readlock, the cost of the call is
actually more. getInternalState has to lock and unlock the readlock. While
splitting the lock is more costly, there is opportunity for another call to
acquire the read lock, perhaps reducing the average latency time of acquiring
the lock for other calls. On the other hand, you all the possibility of getting
inconsistent results as a write could have taken place in between the two
readlocks.
Similar for the getDAGStatus calls that were changed as well. The difference
there is that some of the changes inside of the readlock could actually be
moved outside of the readlock without risk, though not sure how much benefit it
will be.
{code:title=Safe to access outside of the readlock}
ProgressBuilder dagProgress = new ProgressBuilder();
dagProgress.setTotalTaskCount(totalTaskCount);
dagProgress.setSucceededTaskCount(totalSucceededTaskCount);
dagProgress.setRunningTaskCount(totalRunningTaskCount);
dagProgress.setFailedTaskCount(totalFailedTaskCount);
dagProgress.setKilledTaskCount(totalKilledTaskCount);
dagProgress.setFailedTaskAttemptCount(totalFailedTaskAttemptCount);
dagProgress.setKilledTaskAttemptCount(totalKilledTaskAttemptCount);
dagProgress.setRejectedTaskAttemptCount(totalRejectedTaskAttemptCount);
{code}
Alternatively, a new approach could be made where the calculations are done on
a periodic basis and then the readcalls would return only cached values.
> DAGImpl: dag lock is unfair and can starve the writers
> ------------------------------------------------------
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Gopal Vijayaraghavan
> Assignee: László Bodor
> Priority: Major
> Attachments: TEZ-3967.01.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen
> when the writeLock is held.
> {code}
> public void handle(DAGEvent event) {
> ...
> try {
> writeLock.lock();
> {code}
> {code}
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007efb02246f40> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between
> {code}
> at
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)