Yes, I am using a custom input initializer, that might be the problem indeed. Thanks!
On Thu, Jan 8, 2015 at 11:03 PM, Bikas Saha <[email protected]> wrote: > Its likely that its hitting an NPE on this line of code > > Task targetTask = vertex.getTask(riEvent.getTargetIndex()); > targetTask.registerTezEvent(tezEvent); <<<<<<<<<<<< this line > > So the error could be that you are generating X splits but less then X > tasks. So at some point there is a split event for a non-existent task. Are > you using any custom input initializer (or vertex manager)? > > Bikas > > -----Original Message----- > From: Kostas Tzoumas [mailto:[email protected]] > Sent: Thursday, January 08, 2015 1:41 PM > To: [email protected] > Subject: Problem with large job on Flink-on-Tez application > > Hi folks, > > I am running a SQL-like query (similar to TPC-H Q3) on a prototype of > Flink-on-Tez. I am using the latest Tez master. It runs fine on smaller > scales (e.g., TPC-H scale 500), and I am getting the following error at > scale 1250: > > 15/01/08 22:07:42 INFO client.DAGClientImpl: DAG initialized: > CurrentState=Running > 15/01/08 22:07:43 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: > 0% TotalTasks: 12092 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 > 15/01/08 22:07:47 INFO client.DAGClientImpl: DAG: State: FAILED Progress: > 0% TotalTasks: 12092 Succeeded: 0 Running: 0 Failed: 0 Killed: 1246 > 15/01/08 22:07:47 INFO client.DAGClientImpl: DAG completed. > FinalState=FAILED > 15/01/08 22:07:47 ERROR client.TezExecutor: Flink Java Job at Thu Jan 08 > 22:07:33 CET 2015 failed with diagnostics: [Vertex failed, > vertexName=DataSource (at getCustomerDataSet(TPCHQuery3.java:217) > > (org.apache.flink.api.java.io.CsvInputFormat))032ae46b-6bb9-4dcf-91c4-7e49b49f6546, > vertexId=vertex_1420727594991_0021_1_01, diagnostics=[Exception in > VertexManager, vertex=vertex_1420727594991_0021_1_01 [DataSource (at > getCustomerDataSet(TPCHQuery3.java:217) > > (org.apache.flink.api.java.io.CsvInputFormat))032ae46b-6bb9-4dcf-91c4-7e49b49f6546],java.lang.NullPointerException > at > > org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:3857) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1265) > at > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleVertexTasks(VertexManager.java:144) > at > > org.apache.tez.dag.app.dag.impl.ImmediateStartVertexManager.scheduleTasks(ImmediateStartVertexManager.java:100) > at > > org.apache.tez.dag.app.dag.impl.ImmediateStartVertexManager.onVertexStarted(ImmediateStartVertexManager.java:75) > at > > org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:365) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3320) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.access$5100(VertexImpl.java:183) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3293) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3285) > at > > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1587) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:182) > at > > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1763) > at > > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1749) > at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:116) > at java.lang.Thread.run(Thread.java:745) > > > It may well be that this is a problem on the Flink side of things (albeit > not visible in the stack trace), I was just wondering if the error rings a > bell. Since I am creating an input task per input split, this job results > in > quite a few of input tasks. > > The logs contain one more exception [1] caused by the exception above. > > Best, > Kostas > > [1] > > 2015-01-08 22:07:44,448 ERROR [Dispatcher thread: Central] impl.VertexImpl: > Exception in VertexManager, vertex=vertex_1420727594991_0021_1_01 > [DataSource (at getCustomerDataSet(TPCHQuery3.java:217) > > (org.apache.flink.api.java.io.CsvInputFormat))032ae46b-6bb9-4dcf-91c4-7e49b49f6546] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.NullPointerException > at > > org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:368) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3320) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.access$5100(VertexImpl.java:183) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3293) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3285) > at > > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1587) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:182) > at > > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1763) > at > > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1749) > at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:116) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException at > > org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:3857) > at > > org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1265) > at > > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleVertexTasks(VertexManager.java:144) > at > > org.apache.tez.dag.app.dag.impl.ImmediateStartVertexManager.scheduleTasks(ImmediateStartVertexManager.java:100) > at > > org.apache.tez.dag.app.dag.impl.ImmediateStartVertexManager.onVertexStarted(ImmediateStartVertexManager.java:75) > at > > org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:365) > ... 16 more > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
