[ https://issues.apache.org/jira/browse/SPARK-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lisbeth Ron updated SPARK-7369: ------------------------------- Attachment: random_forest_dataframe_spark_30042015.py Hi Sean, I still have problems with python spark here are the errors and also the code that I'm using. Thanks Lisbeth 15/05/06 13:14:24 INFO ContextCleaner: Cleaned broadcast 1 15/05/06 13:14:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on node001.ca-innovation.fr:47882 (size: 11.0 KB, free: 8.3 GB) 15/05/06 13:14:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on node006.ca-innovation.fr:50830 (size: 11.0 KB, free: 8.3 GB) 15/05/06 13:14:25 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5, node001.ca-innovation.fr): java.lang.NullPointerException at org.apache.spark.api.python.SerDeUtil$$anonfun$toJavaArray$1.apply(SerDeUtil.scala:106) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:123) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:114) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:114) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:421) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:243) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:205) 15/05/06 13:14:25 INFO TaskSetManager: Starting task 0.1 in stage 3.0 (TID 7, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes) 15/05/06 13:14:25 INFO TaskSetManager: Lost task 1.0 in stage 3.0 (TID 6) on executor node006.ca-innovation.fr: java.lang.NullPointerException (null) [duplicate 1] 15/05/06 13:14:25 INFO TaskSetManager: Starting task 1.1 in stage 3.0 (TID 8, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes) 15/05/06 13:14:26 INFO TaskSetManager: Lost task 0.1 in stage 3.0 (TID 7) on executor node001.ca-innovation.fr: java.lang.NullPointerException (null) [duplicate 2] 15/05/06 13:14:26 INFO TaskSetManager: Starting task 0.2 in stage 3.0 (TID 9, node006.ca-innovation.fr, NODE_LOCAL, 1409 bytes) 15/05/06 13:14:26 INFO TaskSetManager: Lost task 1.1 in stage 3.0 (TID 8) on executor node001.ca-innovation.fr: java.lang.NullPointerException (null) [duplicate 3] 15/05/06 13:14:26 INFO TaskSetManager: Starting task 1.2 in stage 3.0 (TID 10, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes) 15/05/06 13:14:27 INFO TaskSetManager: Lost task 0.2 in stage 3.0 (TID 9) on executor node006.ca-innovation.fr: java.lang.NullPointerException (null) [duplicate 4] 15/05/06 13:14:27 INFO TaskSetManager: Starting task 0.3 in stage 3.0 (TID 11, node006.ca-innovation.fr, NODE_LOCAL, 1409 bytes) 15/05/06 13:14:27 INFO TaskSetManager: Lost task 1.2 in stage 3.0 (TID 10) on executor node001.ca-innovation.fr: java.lang.NullPointerException (null) [duplicate 5] 15/05/06 13:14:27 INFO TaskSetManager: Starting task 1.3 in stage 3.0 (TID 12, node001.ca-innovation.fr, NODE_LOCAL, 1409 bytes) 15/05/06 13:14:28 INFO TaskSetManager: Lost task 0.3 in stage 3.0 (TID 11) on executor node006.ca-innovation.fr: java.lang.NullPointerException (null) [duplicate 6] 15/05/06 13:14:28 ERROR TaskSetManager: Task 0 in stage 3.0 failed 4 times; aborting job 15/05/06 13:14:28 INFO TaskSchedulerImpl: Cancelling stage 3 15/05/06 13:14:28 INFO TaskSchedulerImpl: Stage 3 was cancelled 15/05/06 13:14:28 INFO DAGScheduler: Stage 3 (count at /mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py:79) failed in 4.025 s 15/05/06 13:14:28 INFO DAGScheduler: Job 3 failed: count at /mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py:79, took 4.052326 s Traceback (most recent call last): File "/mapr/MapR-Cluster/casarisk/data/POCGRO/Codes/Spark_python/RF_Python_Spark_30042015/random_forest_dataframe_spark_30042015.py", line 79, in <module> print trainingData.count() File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line 932, in count return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line 923, in sum return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line 739, in reduce vals = self.mapPartitions(func).collect() File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/pyspark/rdd.py", line 713, in collect port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 11, node006.ca-innovation.fr): java.lang.NullPointerException at org.apache.spark.api.python.SerDeUtil$$anonfun$toJavaArray$1.apply(SerDeUtil.scala:106) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:123) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:114) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:114) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:421) at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:243) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:205) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) -- *Lisbeth* > Spark Python 1.3.1 Mllib dataframe random forest problem > -------------------------------------------------------- > > Key: SPARK-7369 > URL: https://issues.apache.org/jira/browse/SPARK-7369 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark > Affects Versions: 1.3.1 > Reporter: Lisbeth Ron > Labels: hadoop > Attachments: random_forest_dataframe_spark_30042015.py > > > I'm working with Dataframes to train a random forest with mllib > and I have this error > File > "/opt/mapr/spark/spark-1.3.1-bin-mapr4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o58.sql. > somebody can help me...??? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org