[ 
https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045663#comment-14045663
 ] 

Bharath Ravi Kumar commented on SPARK-2292:
-------------------------------------------

Assuming this bug can be reproduced, I'd request that the fix be made available 
in 1.0.1 (voting for which has begun). The issue appears fairly basic and hence 
merits being included in the earliest maintenance release.
Thanks.

> NullPointerException in JavaPairRDD.mapToPair
> ---------------------------------------------
>
>                 Key: SPARK-2292
>                 URL: https://issues.apache.org/jira/browse/SPARK-2292
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>         Environment: Spark 1.0.0, Standalone with the master & single slave 
> running on Ubuntu on a laptop. 4G mem and 8 cores were available to the 
> executor .
>            Reporter: Bharath Ravi Kumar
>            Priority: Critical
>
> Correction: Invoking JavaPairRDD.mapToPair results in an NPE:
> {noformat}
> 14/06/26 21:05:35 WARN scheduler.TaskSetManager: Loss was due to 
> java.lang.NullPointerException
> java.lang.NullPointerException
>       at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
>       at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
>       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>       at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
>       at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
>       at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
>       at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
>       at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>       at org.apache.spark.scheduler.Task.run(Task.scala:51)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:722)
> {noformat}
>  This occurs only after migrating to the 1.0.0 API. The details of the code 
> the data file used to test are included in this gist : 
> https://gist.github.com/reachbach/d8977c8eb5f71f889301



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to