Hi Jan, could you post your codes? I could not reproduce this issue in my environment.
Best Regards, Shixiong Zhu 2015-12-29 10:22 GMT-08:00 Shixiong Zhu <zsxw...@gmail.com>: > Could you create a JIRA? We can continue the discussion there. Thanks! > > Best Regards, > Shixiong Zhu > > 2015-12-29 3:42 GMT-08:00 Jan Uyttenhove <j...@insidin.com>: > >> Hi guys, >> >> I upgraded to the RC4 of Spark (streaming) 1.6.0 to (re)test the new >> mapWithState API, after previously reporting issue SPARK-11932 ( >> https://issues.apache.org/jira/browse/SPARK-11932). >> >> My Spark streaming job involves reading data from a Kafka topic >> (using KafkaUtils.createDirectStream), stateful processing (using >> checkpointing & mapWithState) & publishing the results back to Kafka. >> >> I'm now facing the NullPointerException below when restoring from a >> checkpoint in the following scenario: >> 1/ run job (with local[2]), process data from Kafka while creating & >> keeping state >> 2/ stop the job >> 3/ generate some extra message on the input Kafka topic >> 4/ start the job again (and restore offsets & state from the checkpoints) >> >> The problem is caused by (or at least related to) step 3, i.e. publishing >> data to the input topic while the job is stopped. >> The above scenario has been tested successfully when: >> - step 3 is excluded, so restoring state from a checkpoint is successful >> when no messages are added when the job is stopped >> - after step 2, the checkpoints are deleted >> >> Any clues? Am I doing something wrong here, or is there still a problem >> with the mapWithState impl? >> >> Thanx, >> >> Jan >> >> >> >> 15/12/29 11:56:12 ERROR executor.Executor: Exception in task 0.0 in stage >> 3.0 (TID 24) >> java.lang.NullPointerException >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103) >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at >> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> 15/12/29 11:56:12 INFO storage.BlockManagerInfo: Added rdd_25_1 in memory >> on localhost:10003 (size: 1024.0 B, free: 511.1 MB) >> 15/12/29 11:56:12 INFO storage.ShuffleBlockFetcherIterator: Getting 0 >> non-empty blocks out of 8 blocks >> 15/12/29 11:56:12 INFO storage.ShuffleBlockFetcherIterator: Started 0 >> remote fetches in 0 ms >> 15/12/29 11:56:12 INFO storage.MemoryStore: Block rdd_29_1 stored as >> values in memory (estimated size 1824.0 B, free 488.0 KB) >> 15/12/29 11:56:12 INFO storage.BlockManagerInfo: Added rdd_29_1 in memory >> on localhost:10003 (size: 1824.0 B, free: 511.1 MB) >> 15/12/29 11:56:12 INFO storage.ShuffleBlockFetcherIterator: Getting 0 >> non-empty blocks out of 8 blocks >> 15/12/29 11:56:12 INFO storage.ShuffleBlockFetcherIterator: Started 0 >> remote fetches in 0 ms >> 15/12/29 11:56:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage >> 3.0 (TID 24, localhost): java.lang.NullPointerException >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103) >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at >> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> 15/12/29 11:56:12 INFO storage.MemoryStore: Block rdd_33_1 stored as >> values in memory (estimated size 2.6 KB, free 490.6 KB) >> 15/12/29 11:56:12 INFO storage.BlockManagerInfo: Added rdd_33_1 in memory >> on localhost:10003 (size: 2.6 KB, free: 511.1 MB) >> 15/12/29 11:56:12 ERROR scheduler.TaskSetManager: Task 0 in stage 3.0 >> failed 1 times; aborting job >> 15/12/29 11:56:12 INFO scheduler.TaskSchedulerImpl: Cancelling stage 3 >> 15/12/29 11:56:12 INFO executor.Executor: Executor is trying to kill task >> 1.0 in stage 3.0 (TID 25) >> 15/12/29 11:56:12 INFO scheduler.TaskSchedulerImpl: Stage 3 was cancelled >> 15/12/29 11:56:12 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (map at >> Visitize.scala:91) failed in 0.126 s >> 15/12/29 11:56:12 INFO scheduler.DAGScheduler: Job 0 failed: >> foreachPartition at Visitize.scala:96, took 2.222262 s >> 15/12/29 11:56:12 INFO scheduler.JobScheduler: Finished job streaming job >> 1451386550000 ms.0 from job set of time 1451386550000 ms >> 15/12/29 11:56:12 INFO scheduler.JobScheduler: Total delay: 22.738 s for >> time 1451386550000 ms (execution: 2.308 s) >> 15/12/29 11:56:12 INFO spark.SparkContext: Starting job: foreachPartition >> at Visitize.scala:96 >> 15/12/29 11:56:12 INFO scheduler.JobScheduler: Starting job streaming job >> 1451386560000 ms.0 from job set of time 1451386560000 ms >> 15/12/29 11:56:12 ERROR scheduler.JobScheduler: Error running job >> streaming job 1451386550000 ms.0 >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage >> 3.0 (TID 24, localhost): java.lang.NullPointerException >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103) >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at >> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> <http://org.apache.spark.scheduler.dagscheduler.org/> >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) >> at >> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920) >> at >> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >> at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918) >> at tts.job.Visitize$$anonfun$createContext$8.apply(Visitize.scala:96) >> at tts.job.Visitize$$anonfun$createContext$8.apply(Visitize.scala:94) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) >> at >> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) >> at scala.util.Try$.apply(Try.scala:161) >> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) >> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.NullPointerException >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103) >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at >> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> ... 3 more >> 15/12/29 11:56:12 INFO scheduler.JobGenerator: Checkpointing graph for >> time 1451386550000 ms >> 15/12/29 11:56:12 INFO spark.MapOutputTrackerMaster: Size of output >> statuses for shuffle 3 is 158 bytes >> Exception in thread "main" org.apache.spark.SparkException: Job aborted >> due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent >> failure: Lost task 0.0 in stage 3.0 (TID 24, localhost): >> java.lang.NullPointerException >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103) >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at >> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> <http://org.apache.spark.scheduler.dagscheduler.org/> >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) >> at >> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920) >> at >> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >> at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918) >> at tts.job.Visitize$$anonfun$createContext$8.apply(Visitize.scala:96) >> at tts.job.Visitize$$anonfun$createContext$8.apply(Visitize.scala:94) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) >> at >> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) >> at >> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) >> at scala.util.Try$.apply(Try.scala:161) >> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) >> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) >> at >> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.NullPointerException >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:103) >> at >> org.apache.spark.streaming.util.OpenHashMapBasedStateMap.get(StateMap.scala:111) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:56) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at >> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:154) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:148) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> ... 3 more >> 15/12/29 11:56:12 INFO streaming.DStreamGraph: Updating checkpoint data >> for time 1451386550000 ms >> 15/12/29 11:56:12 INFO executor.Executor: Executor killed task 1.0 in >> stage 3.0 (TID 25) >> 15/12/29 11:56:12 INFO spark.MapOutputTrackerMaster: Size of output >> statuses for shuffle 2 is 153 bytes >> 15/12/29 11:56:12 INFO spark.MapOutputTrackerMaster: Size of output >> statuses for shuffle 1 is 153 bytes >> >> >> >> -- >> Jan Uyttenhove >> Streaming data & digital solutions architect @ Insidin bvba >> >> j...@insidin.com >> >> https://twitter.com/xorto >> https://www.linkedin.com/in/januyttenhove >> >> This e-mail and any files transmitted with it are intended solely for the >> use of the individual or entity to whom they are addressed. It may contain >> privileged and confidential information. If you are not the intended >> recipient please notify the sender immediately and destroy this e-mail. Any >> form of reproduction, dissemination, copying, disclosure, modification, >> distribution and/or publication of this e-mail message is strictly >> prohibited. Whilst all efforts are made to safeguard e-mails, the sender >> cannot guarantee that attachments are virus free or compatible with your >> systems and does not accept liability in respect of viruses or computer >> problems experienced. >> > >