[
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076468#comment-17076468
]
László Bodor commented on TEZ-3967:
-----------------------------------
[~gopalv], [~jeagles]: could you please take a look at [^TEZ-3967.01.patch]
it's basically about narrowing the scope of the read locking
1. removed read lock from getDAGStatus, as it doesn't contain sensitive
operation I think (except counters, but it's handled in called getAllCounters
method)
-> first I thought that iterating vertexMap's entryset could be dangerous
without lock, but I don't expect ConcurrentModificationException as vertexMap
is initialized initializeDAG and then the number of entries are not supposed to
change
2. getAllCounters
removed readlock from the beginning of the method, as
mayBeConstructFinalFullCounters is protected with a fullCountersLock itself +
getInternalState method also has its own locking
do you think this patch can help getDAGStatus lock footprint? if so, let me
know if you want me to do any performance/sanity check
> DAGImpl: dag lock is unfair and can starve the writers
> ------------------------------------------------------
>
> Key: TEZ-3967
> URL: https://issues.apache.org/jira/browse/TEZ-3967
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Gopal Vijayaraghavan
> Assignee: László Bodor
> Priority: Major
> Attachments: TEZ-3967.01.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen
> when the writeLock is held.
> {code}
> public void handle(DAGEvent event) {
> ...
> try {
> writeLock.lock();
> {code}
> {code}
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007efb02246f40> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
> at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
> at
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
> at
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
> at
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
> at
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between
> {code}
> at
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
> at
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
> at
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)