Kevin (Sangwoo) Kim created SPARK-1963:
------------------------------------------

             Summary: Job aborted with NullPointerException from 
DAGScheduler.scala:1020
                 Key: SPARK-1963
                 URL: https://issues.apache.org/jira/browse/SPARK-1963
             Project: Spark
          Issue Type: Bug
            Reporter: Kevin (Sangwoo) Kim


Hi, I'm testing Spark 0.9.1 from EC2 r3.8xlarge (32 core, 240GiB MEM)

During counting active user from 70GB of data, Spark job aborted with NPE from 
DAGScheduler. 
I guess the number of active user count is around 1~2M. 

Here's what I did 
{code}
val logs = sc.textFile("file:///spark/data/*")
val activeUser = logs.map{x => val a = 
LogObjectExtractor.getAnonymousAction(x); a.getUserId}.distinct
activeUser.count
{code}

and here's the log. 

{code}
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Serialized task 1.0:2235 as 
1883 bytes in 1 ms
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Finished TID 2207 in 17541 ms 
on ip-10-169-5-198.ap-northeast-1.compute.internal (progress: 2204/2267)
14/05/29 05:26:46 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 2207)
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Starting task 1.0:2236 as TID 
2236 on executor 0: ip-10-169-5-198.ap-northeast-1.compute.internal 
(PROCESS_LOCAL)
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Serialized task 1.0:2236 as 
1883 bytes in 1 ms
14/05/29 05:26:46 WARN scheduler.TaskSetManager: Lost TID 2230 (task 1.0:2230)
14/05/29 05:26:46 WARN scheduler.TaskSetManager: Loss was due to 
java.lang.NullPointerException
java.lang.NullPointerException
        at 
$line16.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:17)
        at 
$line16.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:17)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
        at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:97)
        at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
        at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:477)
        at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:477)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
        at org.apache.spark.scheduler.Task.run(Task.scala:53)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Starting task 1.0:2230 as TID 
2237 on executor 0: ip-10-169-5-198.ap-northeast-1.compute.internal 
(PROCESS_LOCAL)
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Serialized task 1.0:2230 as 
1883 bytes in 0 ms
14/05/29 05:26:46 WARN scheduler.TaskSetManager: Lost TID 2231 (task 1.0:2231)
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Loss was due to 
java.lang.NullPointerException [duplicate 1]
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Starting task 1.0:2231 as TID 
2238 on executor 0: ip-10-169-5-198.ap-northeast-1.compute.internal 
(PROCESS_LOCAL)
{code}
...
{code}
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Loss was due to 
java.lang.NullPointerException [duplicate 27]
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Loss was due to 
java.lang.NullPointerException [duplicate 28]
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Finished TID 2201 in 17959 ms 
on ip-10-169-5-198.ap-northeast-1.compute.internal (progress: 2210/2267)
14/05/29 05:26:46 INFO scheduler.TaskSetManager: Finished TID 2209 in 16588 ms 
on ip-10-169-5-198.ap-northeast-1.compute.internal (progress: 2211/2267)
org.apache.spark.SparkException: Job aborted: Task 1.0:2230 failed 4 times 
(most recent failure: Exception failure: java.lang.NullPointerException)
{code}

Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to