[ 
https://issues.apache.org/jira/browse/TEZ-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482751#comment-14482751
 ] 

Jeff Zhang commented on TEZ-2262:
---------------------------------

[~hitesh], I think we need to fail the dag if the counter's limits are hit. 
Because users may use the counter for other purposes. If we still make the dag 
succeeded, users will thought the counters are also correct, it's very 
difficult for them to find out why the counters are not correct while dag is 
succeeded. 

> DAG/Tasks should not fail if counter limits are exceeded.
> ---------------------------------------------------------
>
>                 Key: TEZ-2262
>                 URL: https://issues.apache.org/jira/browse/TEZ-2262
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Mostafa Mokhtar
>
> Running TPC-DS Q64 failed due to exceeding the max number of counters.
> DAG should succeed and include a warning in the diagnostics stating that the 
> error got truncated.
> {code}
> 18043560327-2015-04-01 16:23:08,509 INFO [AsyncDispatcher event handler] 
> impl.DAGImpl: No output committers for vertex: Reducer 9
> 18043560445-2015-04-01 16:23:08,857 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher: Error in dispatcher thread
> 18043560557:org.apache.tez.common.counters.LimitExceededException: Too many 
> counters: 1201 max=1200
> 18043560645-  at 
> org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
> 18043560717-  at 
> org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
> 18043560788-  at 
> org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:75)
> 18043560885-  at 
> org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:92)
> 18043560986-  at 
> org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:103)
> 18043561085-  at 
> org.apache.tez.common.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:198)
> 18043561188-  at 
> org.apache.tez.common.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:363)
> 18043561283-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.incrTaskCounters(DAGImpl.java:598)
> 18043561362-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getAllCounters(DAGImpl.java:588)
> 18043561439-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryFinishedEvent(DAGImpl.java:994)
> 18043561528-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1135)
> 18043561600-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.checkDAGForCompletion(DAGImpl.java:1048)
> 18043561685-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl$VertexCompletedTransition.transition(DAGImpl.java:1708)
> 18043561785-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl$VertexCompletedTransition.transition(DAGImpl.java:1665)
> 18043561885-  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> 18043562001-  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 18043562097-  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 18043562190-  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 18043562307-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:944)
> 18043562376-  at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:126)
> 18043562445-  at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1686)
> 18043562535-  at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1677)
> 18043562625-  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> 18043562709-  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> 18043562790-  at java.lang.Thread.run(Thread.java:745)
> 18043562832-2015-04-01 16:23:08,882 INFO [AsyncDispatcher event handler] 
> event.AsyncDispatcher: Exiting, bbye..
> 18043562932-2015-04-01 16:23:08,885 INFO [Thread-1] app.DAGAppMaster: 
> DAGAppMasterShutdownHook invoked
> 18043563023-2015-04-01 16:23:08,885 INFO [Thread-1] app.DAGAppMaster: 
> DAGAppMaster received a signal. Signaling TaskScheduler
> 18043563137-2015-04-01 16:23:08,885 INFO [Thread-1] 
> rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : 
> true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to